Customizing Library Models for Python¶
Beta Notice - Unstable API
Library customization using data extensions is currently in beta and subject to change.
Breaking changes to this format may occur while in beta.
Python analysis can be customized by adding library models in data extension files.
A data extension for Python is a YAML file of the form:
extensions:
- addsTo:
pack: codeql/python-all
extensible: <name of extensible predicate>
data:
- <tuple1>
- <tuple2>
- ...
The CodeQL library for Python exposes the following extensible predicates:
sourceModel(type, path, kind)sinkModel(type, path, kind)typeModel(type1, type2, path)summaryModel(type, path, input, output, kind)barrierModel(type, path, kind)barrierGuardModel(type, path, acceptingValue, kind)
We’ll explain how to use these using a few examples, and provide some reference material at the end of this article.
Example: Taint sink in the ‘fabric’ package¶
In this example, we’ll show how to add the following argument, passed to sudo from the fabric package, as a command-line injection sink:
from fabric.operations import sudo
sudo(cmd) # <-- add 'cmd' as a taint sink
Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the
sinkModel(type, path, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/python-all
extensible: sinkModel
data:
- ["fabric", "Member[operations].Member[sudo].Argument[0]", "command-injection"]
The first column,
"fabric", identifies a set of values from which to begin the search for the sink. The string"fabric"means we start at the places where the codebase imports the packagefabric.The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column.
Member[operations]selects accesses to theoperationsmodule.Member[sudo]selects accesses to thesudofunction in theoperationsmodule.Argument[0]selects the first argument to calls to that function.
"command-injection"indicates that this is considered a sink for the command injection query.
Example: Taint sink in the ‘invoke’ package¶
Often sinks are found as arguments to methods rather than functions. In this example, we’ll show how to add the following argument, passed to run from the invoke package, as a command-line injection sink:
import invoke
c = invoke.Context()
c.run(cmd) # <-- add 'cmd' as a taint sink
Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could add a tuple to the
sinkModel(type, path, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/python-all
extensible: sinkModel
data:
- ["invoke", "Member[Context].Instance.Member[run].Argument[0]", "command-injection"]
The first column,
"invoke", begins the search at places where the codebase imports the packageinvoke.The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column.
Member[Context]selects accesses to theContextclass.Instanceselects instances of theContextclass.Member[run]selects accesses to therunmethod in theContextclass.Argument[0]selects the first argument to calls to that method.
"command-injection"indicates that this is considered a sink for the command injection query.
Note that the Instance component is used to select instances of a class, including instances of its subclasses.
Since methods on instances are common targets, we have a more compact syntax for selecting them. The first column, the type, is allowed to contain a dotted path ending in a class name.
This will begin the search at instances of that class. Using this syntax, the previous example could be written as:
extensions:
- addsTo:
pack: codeql/python-all
extensible: sinkModel
data:
- ["invoke.Context", "Member[run].Argument[0]", "command-injection"]
Continued example: Multiple ways to obtain a type¶
The invoke package provides multiple ways to obtain a Context instance. The following example shows how to add a new way to obtain a Context instance:
from invoke import context
c = context.Context()
c.run(cmd) # <-- add 'cmd' as a taint sink
Comparing to the previous Python snippet, the Context class is now found as invoke.context.Context instead of invoke.Context.
We could add a data extension similar to the previous one, but with the type invoke.context.Context.
However, we can also use the typeModel(type1, type2, path) extensible predicate to describe how to reach invoke.Context from invoke.context.Context:
extensions:
- addsTo:
pack: codeql/python-all
extensible: typeModel
data:
- ["invoke.Context", "invoke.context.Context", ""]
The first column,
"invoke.Context", is the name of the type to reach.The second column,
"invoke.context.Context", is the name of the type from which to evaluate the path.The third column is just an empty string, indicating that any instance of
invoke.context.Contextis also an instance ofinvoke.Context.
Combining this with the sink model we added earlier, the sink in the example is detected by the model.
Example: Taint sources from Django ‘upload_to’ argument¶
This example is a bit more advanced, involving both a callback function and a class constructor.
The Django web framework allows you to specify a function that determines the path where uploaded files are stored (see the Django documentation).
This function is passed as an argument to the FileField constructor.
The function is called with two arguments: the instance of the model and the filename of the uploaded file.
This filename is what we want to mark as a taint source. An example use looks as follows:
from django.db import models
def user_directory_path(instance, filename): # <-- add 'filename' as a taint source
# file will be uploaded to MEDIA_ROOT/user_<id>/<filename>
return "user_{0}/{1}".format(instance.user.id, filename)
class MyModel(models.Model):
upload = models.FileField(upload_to