CodeQL documentation

CodeQL library for JavaScript

When you’re analyzing a JavaScript program, you can make use of the large collection of classes in the CodeQL library for JavaScript.

Overview

There is an extensive CodeQL library for analyzing JavaScript code. The classes in this library present the data from a CodeQL database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks.

The library is implemented as a set of QL modules, that is, files with the extension .qll. The module javascript.qll imports most other standard library modules, so you can include the complete library by beginning your query with:

import javascript

The rest of this tutorial briefly summarizes the most important classes and predicates provided by this library, including references to the detailed API documentation where applicable.

Introducing the library

The CodeQL library for JavaScript presents information about JavaScript source code at different levels:

  • Textual — classes that represent source code as unstructured text files

  • Lexical — classes that represent source code as a series of tokens and comments

  • Syntactic — classes that represent source code as an abstract syntax tree

  • Name binding — classes that represent scopes and variables

  • Control flow — classes that represent the flow of control during execution

  • Data flow — classes that you can use to reason about data flow in JavaScript source code

  • Type inference — classes that you can use to approximate types for JavaScript expressions and variables

  • Call graph — classes that represent the caller-callee relationship between functions

  • Inter-procedural data flow — classes that you can use to define inter-procedural data flow and taint tracking analyses

  • Frameworks — classes that represent source code entities that have a special meaning to JavaScript tools and frameworks

Note that representations above the textual level (for example the lexical representation or the flow graphs) are only available for JavaScript code that does not contain fatal syntax errors. For code with such errors, the only information available is at the textual level, as well as information about the errors themselves.

Additionally, there is library support for working with HTML documents, JSON, and YAML data, JSDoc comments, and regular expressions.

Textual level

At its most basic level, a JavaScript code base can simply be viewed as a collection of files organized into folders, where each file is composed of zero or more lines of text.

Note that the textual content of a program is not included in the CodeQL database unless you specifically request it during extraction.

Files and folders

In the CodeQL libraries, files are represented as entities of class File, and folders as entities of class Folder, both of which are subclasses of class Container.

Class Container provides the following member predicates:

  • Container.getParentContainer() returns the parent folder of the file or folder.

  • Container.getAFile() returns a file within the folder.

  • Container.getAFolder() returns a folder nested within the folder.

Note that while getAFile and getAFolder are declared on class Container, they currently only have results for Folders.

Both files and folders have paths, which can be accessed by the predicate Container.getAbsolutePath(). For example, if f represents a file with the path /home/user/project/src/index.js, then f.getAbsolutePath() evaluates to the string "/home/user/project/src/index.js", while f.getParentContainer().getAbsolutePath() returns "/home/user/project/src".

These paths are absolute file system paths. If you want to obtain the path of a file relative to the source location in the CodeQL database, use Container.getRelativePath() instead. Note, however, that a database may contain files that are not located underneath the source location; for such files, getRelativePath() will not return anything.

The following member predicates of class Container provide more information about the name of a file or folder:

  • Container.getBaseName() returns the base name of a file or folder, not including its parent folder, but including its extension. In the above example, f.getBaseName() would return the string "index.js".

  • Container.getStem() is similar to Container.getBaseName(), but it does not include the file extension; so f.getStem() returns "index".

  • Container.getExtension() returns the file extension, not including the dot; so f.getExtension() returns "js".

For example, the following query computes, for each folder, the number of JavaScript files (that is, files with extension js) contained in the folder:

import javascript

from Folder d
select d.getRelativePath(), count(File f | f = d.getAFile() and f.getExtension() = "js")

When you run the query on most projects, the results include folders that contain files with a js extension and folders that don’t.

Locations

Most entities in a CodeQL database have an associated source location. Locations are identified by five pieces of information: a file, a start line, a start column, an end line, and an end column. Line and column counts are 1-based (so the first character of a file is at line 1, column 1), and the end position is inclusive.

All entities associated with a source location belong to the class Locatable. The location itself is modeled by the class Location and can be accessed through the member predicate Locatable.getLocation(). The Location class provides the following member predicates:

  • Location.getFile(), Location.getStartLine(), Location.getStartColumn(), Location.getEndLine(), Location.getEndColumn() return detailed information about the location.

  • Location.getNumLines() returns the number of (whole or partial) lines covered by the location.

  • Location.startsBefore(Location) and Location.endsAfter(Location) determine whether one location starts before or ends after another location.

  • Location.contains(Location) indicates whether one location completely contains another location; l1.contains(l2) holds if, and only if, l1.startsBefore(l2) and l1.endsAfter(l2).

Lines

Lines of text in files are represented by the class Line. This class offers the following member predicates:

  • Line.getText() returns the text of the line, excluding any terminating newline characters.

  • Line.getTerminator() returns the terminator character(s) of the line. The last line in a file may not have any terminator characters, in which case this predicate does not return anything; otherwise it returns either the two-character string "\r\n" (carriage-return followed by newline), or one of the one-character strings "\n" (newline), "\r" (carriage-return), "\u2028" (Unicode character LINE SEPARATOR), "\u2029" (Unicode character PARAGRAPH SEPARATOR).

Note that, as mentioned above, the textual representation of the program is not included in the CodeQL database by default.

Lexical level

A slightly more structured view of a JavaScript program is provided by the classes Token and Comment, which represent tokens and comments, respectively.

Tokens

The most important member predicates of class Token are as follows:

  • Token.getValue() returns the source text of the token.

  • Token.getIndex() returns the index of the token within its enclosing script.

  • Token.getNextToken() and Token.getPreviousToken() navigate between tokens.

The Token class has nine subclasses, each representing a particular kind of token:

As an example of a query operating entirely on the lexical level, consider the following query, which finds consecutive comma tokens arising from an omitted element in an array expression:

import javascript

class CommaToken extends PunctuatorToken {
    CommaToken() {
        getValue() = ","
    }
}

from CommaToken comma
where comma.getNextToken() instanceof CommaToken
select comma, "Omitted array elements are bad style."

If the query returns no results, this pattern isn’t used in the projects that you analyzed.

You can use predicate Locatable.getFirstToken() and Locatable.getLastToken() to access the first and last token (if any) belonging to an element with a source location.

Comments

The class Comment and its subclasses represent the different kinds of comments that can occur in JavaScript programs:

The most important member predicates are as follows:

  • Comment.getText() returns the source text of the comment, not including delimiters.

  • Comment.getLine(i) returns the ith line of text within the comment (0-based).

  • Comment.getNumLines() returns the number of lines in the comment.

  • Comment.getNextToken() returns the token immediately following a comment. Note that such a token always exists: if a comment appears at the end of a file, its following token is an EOFToken.

As an example of a query using only lexical information, consider the following query for finding HTML comments, which are not a standard ECMAScript feature and should be avoided:

import javascript

from HtmlLineComment c
select c, "Do not use HTML comments."

Syntactic level

The majority of classes in the JavaScript library is concerned with representing a JavaScript program as a collection of abstract syntax trees (ASTs).

The class ASTNode contains all entities representing nodes in the abstract syntax trees and defines generic tree traversal predicates:

  • ASTNode.getChild(i): returns the ith child of this AST node.

  • ASTNode.getAChild(): returns any child of this AST node.

  • ASTNode.getParent(): returns the parent node of this AST node, if any.

Note

These predicates should only be used to perform generic AST traversal. To access children of specific AST node types, the specialized predicates introduced below should be used instead. In particular, queries should not rely on the numeric indices of child nodes relative to their parent nodes: these are considered an implementation detail that may change between versions of the library.

Top-levels

From a syntactic point of view, each JavaScript program is composed of one or more top-level code blocks (or top-levels for short), which are blocks of JavaScript code that do not belong to a larger code block. Top-levels are represented by the class TopLevel and its subclasses:

Every TopLevel class is contained in a File class, but a single File may contain more than one TopLevel. To go from a TopLevel tl to its File, use tl.getFile(); conversely, for a File f, predicate f.getATopLevel() returns a top-level contained in f. For every AST node, predicate ASTNode.getTopLevel() can be used to find the top-level it belongs to.

The TopLevel class additionally provides the following member predicates:

  • TopLevel.getNumberOfLines() returns the total number of lines (including code, comments and whitespace) in the top-level.

  • TopLevel.getNumberOfLinesOfCode() returns the number of lines of code, that is, lines that contain at least one token.

  • TopLevel.getNumberOfLinesOfComments() returns the number of lines containing or belonging to a comment.

  • TopLevel.isMinified() determines whether the top-level contains minified code, using a heuristic based on the average number of statements per line.

Note

By default, GitHub code scanning filters out alerts in minified top-levels, since they are often hard to interpret. When you write your own queries in Visual Studio Code, this filtering is not done automatically, so you may want to explicitly add a condition of the form and not e.getTopLevel().isMinified() or similar to your query to exclude results in minified code.

Statements and expressions

The most important subclasses of ASTNode besides TopLevel are Stmt and Expr, which, together with their subclasses, represent statements and expressions, respectively. This section briefly discusses some of the more important classes and predicates. For a full reference of all the subclasses of Stmt and Expr and their API, see Stmt.qll and Expr.qll.

  • Stmt: use Stmt.getContainer() to access the innermost function or top-level in which the statement is contained.

    • ControlStmt: a statement that controls the execution of other statements, that is, a conditional, loop, try or with statement; use ControlStmt.getAControlledStmt() to access the statements that it controls.

      • IfStmt: an if statement; use IfStmt.getCondition(), IfStmt.getThen() and IfStmt.getElse() to access its condition expression, “then” branch and “else” branch, respectively.

      • LoopStmt: a loop; use Loop.getBody() and Loop.getTest() to access its body and its test expression, respectively.

        • WhileStmt, DoWhileStmt: a “while” or “do-while” loop, respectively.

        • ForStmt: a “for” statement; use ForStmt.getInit() and ForStmt.getUpdate() to access the init and update expressions, respectively.

        • EnhancedForLoop: a “for-in” or “for-of” loop; use EnhancedForLoop.getIterator() to access the loop iterator (which may be a expression or variable declaration), and EnhancedForLoop.getIterationDomain() to access the expression being iterated over.

      • WithStmt: a “with” statement; use WithStmt.getExpr() and WithStmt.getBody() to access the controlling expression and the body of the with statement, respectively.

      • SwitchStmt: a switch statement; use SwitchStmt.getExpr() to access the expression on which the statement switches; use SwitchStmt.getCase(int) and SwitchStmt.getACase() to access individual switch cases; each case is modeled by an entity of class Case, whose member predicates Case.getExpr() and Case.getBodyStmt(int) provide access to the expression checked by the switch case (which is undefined for default), and its body.

      • TryStmt: a “try” statement; use TryStmt.getBody(), TryStmt.getCatchClause() and TryStmt.getFinally to access its body, “catch” clause and “finally” block, respectively.

    • BlockStmt: a block of statements; use BlockStmt.getStmt(int) to access the individual statements in the block.

    • ExprStmt: an expression statement; use ExprStmt.getExpr() to access the expression itself.

    • JumpStmt: a statement that disrupts structured control flow, that is, one of break, continue, return and throw; use predicate JumpStmt.getTarget() to determine the target of the jump, which is either a statement or (for return and uncaught throw statements) the enclosing function.

      • BreakStmt: a “break” statement; use BreakStmt.getLabel() to access its (optional) target label.

      • ContinueStmt: a “continue” statement; use ContinueStmt.getLabel() to access its (optional) target label.

      • ReturnStmt: a “return” statement; use ReturnStmt.getExpr() to access its (optional) result expression.

      • ThrowStmt: a “throw” statement; use ThrowStmt.getExpr() to access its thrown expression.

    • FunctionDeclStmt: a function declaration statement; see below for available member predicates.

    • ClassDeclStmt: a class declaration statement; see below for available member predicates.

    • DeclStmt: a declaration statement containing one or more declarators which can be accessed by predicate DeclStmt.getDeclarator(int).

  • Expr: use Expr.getEnclosingStmt() to obtain the innermost statement to which this expression belongs; Expr.isPure() determines whether the expression is side-effect-free.