Skip to content

JS: Add routing trees library#7049

Merged
codeql-ci merged 37 commits intogithub:mainfrom
asgerf:js/routing-trees
Dec 17, 2021
Merged

JS: Add routing trees library#7049
codeql-ci merged 37 commits intogithub:mainfrom
asgerf:js/routing-trees

Conversation

@asgerf
Copy link
Copy Markdown
Contributor

@asgerf asgerf commented Nov 3, 2021

This introduces the Routing library, modeling the composition of middleware functions and route handlers, and helps reason about data flow between these.

Previously we were missing data flow in cases like this:

app.use((req, res) => {
  req.data = req.query.x;
});
app.get('/', (req, res) => {
  let x = req.data; // <-- missed flow from req.query.x
});

Data-flow configurations as well as API graphs can now track flow between the two functions above.

What's in this PR

  • The Routing library with a bunch of abstract classes for framework models to extend.
  • It is instantiated for Express and Fastify for now. There are more framework models, but I had to stop creeping the scope of the PR.
  • The framework models themselves haven't changed much; they just contribute to the routing tree model, which then contributes new data-flow and API graph steps.
  • The CSRF and rate limiting queries now use the routing tree model.
  • req.body is now seen as a deeply tainted object in more cases, as the logic for detecting the body parser is now based on routing trees. We get some new results because of this.

Routing tree examples

The structure of the router hierarchy is seen as a tree, with each node being a function that can dispatch the incoming request to its children, or pass it on to its next sibling. We a graph-structure used to represent the possible trees that can be constructed at runtime. (Branching control-flow can cause the graph to be non-tree shaped)

For example:

app.use(foo());
app.get(bar());
app.get('/', a, b, c);

would generate a tree of form:

  • app
    • app.use()
      • foo()
    • app.get()
      • bar()
    • app.get('/)
      • a
      • b
      • c

Now suppose foo() is defined to return another router:

function foo() {
  let router = express.Router();
  router.use(...);
  return router;
}

The inner router would appear as a child in the overall routing tree:

  • app
    • app.use()
      • foo()
        • router
          • router.use(...)
    • ...

In general, if there is flow from A to B, then A becomes a child of B. For example,

function A(req, res) {}
let B1 = A;
app.use(B1);
let B2 = A;
app.use(B2);

The hierarchy would be as follows, with A occurring in two different places, but their corresponding use-sites (B1, B2) still explicit in the tree.

  • app
    • app.use()
      • B1
        • A
    • app.use()
      • B2
        • A

(In this example, the explicit let bindings were just there to make it easier to refer to the use-sites, but separate use-site nodes would exist even if the let bindings were inlined)

If a router is returned, the data-flow rule wires things up nicely:

function makeRouter() {
  let router = express.Router();
  router.use(A);
  return router;
}
app.use(makeRouter());
  • app
    • app.use()
      • makeRouter()
        • router
          • router.use()
            • A

An interesting case is when a mutable router object escapes into another function. In this case, we need interprocedural reasoning about side effects, since ordering of middlewares reflects the order in which certain calls are made. We simplify this by creating a separate node for each function that touches the router object, and at the point where the router escapes, we add a child relationship between caller and callee. For example:

function addMiddlewares(router) {
  router.use(A);
  router.use(B);
}
app.use(cookie());
addMiddlewares(app);
app.get('/', ...)
  • app
    • app.use(cookie())
    • addMiddlewares(app) (point of escape)
      • router (callee's view of router)
        • router.use(A)
        • router.use(B)
    • app.get('/', ...)

With the above tree, we can see that A,B come after cookie() but before the get('/') handler.

Evaluations (internal links)

  • On EJS slugs we get 17 new alerts, not counting the 465 missing rate-limiting alerts which is a bit ridiculous. Mostly TPs as far as I can tell, although some of them are existing alerts with a new source detected through middleware-flow.
  • Default slugs shows good performance and some new SQL injections, and both new TPs and fixed FPs for missing CSRF middleware

@asgerf asgerf added JS JS:changes-sources-or-sinks Changes taint sources/sinks for the JS analysis labels Nov 3, 2021
@asgerf asgerf force-pushed the js/routing-trees branch 2 times, most recently from a3374f2 to 57cc611 Compare November 4, 2021 13:19
@asgerf asgerf marked this pull request as ready for review November 8, 2021 11:19
@asgerf asgerf requested a review from a team as a code owner November 8, 2021 11:19
Copy link
Copy Markdown
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great 🎉

This is just a preliminary commit-by-commit review with some small comments.
I'll take another look at the entire thing later.

Comment thread javascript/ql/lib/semmle/javascript/Routing.qll Outdated
Comment thread javascript/ql/lib/semmle/javascript/Routing.qll Outdated
Comment thread javascript/ql/lib/semmle/javascript/Routing.qll Outdated
Comment on lines +32 to +34
// The callback parameter is either a MongoClient or Db depending on the mongodb package version,
// but we just model it as both.
result = getAMongoDbCallback().getParameter(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a motivating example (or test) for this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db parameter on this line is actually a client/connection, so the previous filter led to a FN there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually digging into this a bit further, it looks like mongodb@2.x had a .db method on a database, and had no separate concept for a connection/client. I'll update the model to reflect this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and merged the two concepts entirely

import semmle.javascript.security.dataflow.MissingRateLimiting
import semmle.javascript.RestrictedLocations

// TODO: Previously we used (FirstLineOf) here, but that doesn't work anymore.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just remove this TODO.

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>
@erik-krogh erik-krogh self-assigned this Dec 13, 2021
Copy link
Copy Markdown
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very, very nice work 👍

It takes a bit to understand how the routing-trees are recursively created, but it's nice.

I only have some minor comments.
Should be good to merge afterwards.

Comment thread javascript/ql/lib/semmle/javascript/frameworks/Express.qll Outdated
Comment thread javascript/ql/lib/semmle/javascript/frameworks/Express.qll
Comment thread javascript/ql/lib/semmle/javascript/frameworks/Fastify.qll Outdated
Comment thread javascript/ql/lib/semmle/javascript/frameworks/Fastify.qll Outdated
Comment thread javascript/ql/lib/semmle/javascript/Routing.qll Outdated
Comment thread javascript/ql/lib/semmle/javascript/Routing.qll
Comment thread javascript/ql/lib/semmle/javascript/Routing.qll Outdated
Comment thread javascript/ql/lib/semmle/javascript/Routing.qll Outdated
Comment thread javascript/ql/lib/semmle/javascript/Routing.qll Outdated
@asgerf
Copy link
Copy Markdown
Contributor Author

asgerf commented Dec 16, 2021

Thanks for the thorough review @erik-krogh! All comments addressed, and I've started another round of evaluations.

Copy link
Copy Markdown
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@asgerf
Copy link
Copy Markdown
Contributor Author

asgerf commented Dec 17, 2021

Updated evaluations:

  • Default slugs still looks ok
  • Fastify slugs shows 2 new XSS-like alerts which seem plausible, though the projects are hard to setup so I haven't verified

@codeql-ci codeql-ci merged commit 39ec713 into github:main Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation JS:changes-sources-or-sinks Changes taint sources/sinks for the JS analysis JS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants