CodeQL documentation

Customizing Library Models for Python

Beta Notice - Unstable API

Library customization using data extensions is currently in beta and subject to change.

Breaking changes to this format may occur while in beta.

Python analysis can be customized by adding library models in data extension files.

A data extension for Python is a YAML file of the form:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: <name of extensible predicate>
    data:
      - <tuple1>
      - <tuple2>
      - ...

The CodeQL library for Python exposes the following extensible predicates:

  • sourceModel(type, path, kind)

  • sinkModel(type, path, kind)

  • typeModel(type1, type2, path)

  • summaryModel(type, path, input, output, kind)

  • barrierModel(type, path, kind)

  • barrierGuardModel(type, path, acceptingValue, kind)

We’ll explain how to use these using a few examples, and provide some reference material at the end of this article.

Example: Taint sink in the ‘fabric’ package

In this example, we’ll show how to add the following argument, passed to sudo from the fabric package, as a command-line injection sink:

from fabric.operations import sudo
sudo(cmd) # <-- add 'cmd' as a taint sink

Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sinkModel
    data:
      - ["fabric", "Member[operations].Member[sudo].Argument[0]", "command-injection"]
  • The first column, "fabric", identifies a set of values from which to begin the search for the sink. The string "fabric" means we start at the places where the codebase imports the package fabric.

  • The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column.

    • Member[operations] selects accesses to the operations module.

    • Member[sudo] selects accesses to the sudo function in the operations module.

    • Argument[0] selects the first argument to calls to that function.

  • "command-injection" indicates that this is considered a sink for the command injection query.

Example: Taint sink in the ‘invoke’ package

Often sinks are found as arguments to methods rather than functions. In this example, we’ll show how to add the following argument, passed to run from the invoke package, as a command-line injection sink:

import invoke
c = invoke.Context()
c.run(cmd) # <-- add 'cmd' as a taint sink

Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sinkModel
    data:
      - ["invoke", "Member[Context].Instance.Member[run].Argument[0]", "command-injection"]
  • The first column, "invoke", begins the search at places where the codebase imports the package invoke.

  • The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column.

    • Member[Context] selects accesses to the Context class.

    • Instance selects instances of the Context class.

    • Member[run] selects accesses to the run method in the Context class.

    • Argument[0] selects the first argument to calls to that method.

  • "command-injection" indicates that this is considered a sink for the command injection query.

Note that the Instance component is used to select instances of a class, including instances of its subclasses. Since methods on instances are common targets, we have a more compact syntax for selecting them. The first column, the type, is allowed to contain a dotted path ending in a class name. This will begin the search at instances of that class. Using this syntax, the previous example could be written as:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sinkModel
    data:
      - ["invoke.Context", "Member[run].Argument[0]", "command-injection"]

Continued example: Multiple ways to obtain a type

The invoke package provides multiple ways to obtain a Context instance. The following example shows how to add a new way to obtain a Context instance:

from invoke import context
c = context.Context()
c.run(cmd) # <-- add 'cmd' as a taint sink

Comparing to the previous Python snippet, the Context class is now found as invoke.context.Context instead of invoke.Context. We could add a data extension similar to the previous one, but with the type invoke.context.Context. However, we can also use the typeModel(type1, type2, path) extensible predicate to describe how to reach invoke.Context from invoke.context.Context:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: typeModel
    data:
      - ["invoke.Context", "invoke.context.Context", ""]
  • The first column, "invoke.Context", is the name of the type to reach.

  • The second column, "invoke.context.Context", is the name of the type from which to evaluate the path.

  • The third column is just an empty string, indicating that any instance of invoke.context.Context is also an instance of invoke.Context.

Combining this with the sink model we added earlier, the sink in the example is detected by the model.

Example: Taint sources from Django ‘upload_to’ argument

This example is a bit more advanced, involving both a callback function and a class constructor. The Django web framework allows you to specify a function that determines the path where uploaded files are stored (see the Django documentation). This function is passed as an argument to the FileField constructor. The function is called with two arguments: the instance of the model and the filename of the uploaded file. This filename is what we want to mark as a taint source. An example use looks as follows:

from django.db import models

def user_directory_path(instance, filename): # <-- add 'filename' as a taint source
  # file will be uploaded to MEDIA_ROOT/user_<id>/<filename>
  return "user_{0}/{1}".format(instance.user.id, filename)

class MyModel(models.Model):
  upload = models.FileField(upload_to