Skip to content

Java: Automodel Framework Mode Extraction Queries#12830

Merged
kaeluka merged 40 commits intomainfrom
kaeluka/parameter-candidate-extraction
May 11, 2023
Merged

Java: Automodel Framework Mode Extraction Queries#12830
kaeluka merged 40 commits intomainfrom
kaeluka/parameter-candidate-extraction

Conversation

@kaeluka
Copy link
Copy Markdown

@kaeluka kaeluka commented Apr 14, 2023

For extraction of automodel candidates from method declarations in "framework mode".

This PR contributes three new queries that extract a) candidates, b) positive prompt samples, c) negative prompt samples.

What's new here is: in framework mode, the 'endpoints' we consider are no longer the arguments passed to method calls. Instead, the endpoints we consider are parameter declarations in methods.

Notes:

  • I built this code in a fairly modular way — the AutomodelSharedCharacteristics module is language agnostic (it might be moved to a different location later on).
    • The idea is that, by implementing a AutomodelSharedCharacteristics::CandidateSig, you can derive a bunch of characteristics 'for free'. Enough, I expect, to have useful behaviour straight out of the gate. The AutomodelSharedCharacteristics::CandidateSig interface is very generic — it should be possible to implement this for a very wide range of use cases. If you can think of important use cases where we'd not be able to implement AutomodelSharedCharacteristics::CandidateSig, that'd be useful feedback.
    • Even though you get characteristics for free, it's easy as ever to implement your own, language specific ones in addition by extending the abstract characteristics classes exported from your instantiation of AutomodelSharedCharacteristics::SharedCharacteristics. Examples of how to do this are all characteristics in AutomodelEndpointCharacteristics.qll.
  • The modular approach would be useful, should we decide to move the other mode (where endpoints are arguments) here as well. AFAICT, this transition should be rather easy.
  • The most recent DCA experiment running the three queries on nightly.yml is here: https://github.com/github/codeql-dca-main/issues/12741 A Java Team reviewer might want to use this to a) download a sarif file and b) confirm, for an arbitrary class or two, that the whole public interface is exported as candidates (unless things that are already modeled, those should be exported as positive or negative samples).
  • All the Java-specific imports in java/ql/src/Telemetry/AutomodelEndpointCharacteristics.qll are private imports — this means that the queries will be as language agnostic as possible and we should be able to very quickly port the query files to other modes or even languages later on :) There's other ways to achieve easy portability, but for the time being I chose to go with the simple route.
  • The most noteworthy feature I couldn't port over from the 'application mode' branch (tiferet/codex) is to mark known sanitizers as non-sinks. This is because sanitizers are not implemented in MaD, and the manually-implemented QL predicates are surfacing arguments, not parameters. When testing this, we should see how much of a problem this is in practice. If it is a problem, we should discuss either how much work it would be to have sanitizer MaD models, or use an approximate solution that'd work for many cases, but not all. I'm optimistic it won't come to that, though 🤞

@github-actions github-actions bot added the Java label Apr 14, 2023
Comment thread java/ql/src/Telemetry/ExtractAutomodelCandidates.ql Fixed
@kaeluka kaeluka force-pushed the kaeluka/parameter-candidate-extraction branch from 4388659 to 0849bc8 Compare April 14, 2023 10:42
@owen-mc owen-mc changed the title ExtractAutomodelCandidates.ql query Java: ExtractAutomodelCandidates.ql query Apr 14, 2023
@kaeluka kaeluka force-pushed the kaeluka/parameter-candidate-extraction branch 4 times, most recently from 57e5c39 to daddfb7 Compare April 14, 2023 14:37
Comment thread java/ql/src/Telemetry/ExtractAutomodelCandidates.ql Fixed
Copy link
Copy Markdown

@github-advanced-security github-advanced-security AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@kaeluka kaeluka force-pushed the kaeluka/parameter-candidate-extraction branch 2 times, most recently from 2307b1b to 8da200b Compare April 24, 2023 14:20
@kaeluka
Copy link
Copy Markdown
Author

kaeluka commented Apr 25, 2023

https://github.com/github/codeql-dca-main/issues/12666 DCA experiment containing some data to use

@kaeluka kaeluka force-pushed the kaeluka/parameter-candidate-extraction branch 2 times, most recently from a1f7ed3 to df434d6 Compare April 25, 2023 14:10
@kaeluka
Copy link
Copy Markdown
Author

kaeluka commented Apr 25, 2023

https://github.com/github/codeql-dca-main/issues/12694 <- DCA experiment that will hopefully also contain positive training data once finished :-)

@kaeluka kaeluka force-pushed the kaeluka/parameter-candidate-extraction branch 2 times, most recently from eaec7ea to 001e2d8 Compare April 27, 2023 07:55
Comment thread java/ql/src/Telemetry/AutomodelEndpointCharacteristics.qll Fixed
@kaeluka kaeluka force-pushed the kaeluka/parameter-candidate-extraction branch from 511c4f7 to afaa8c5 Compare April 27, 2023 12:42
Comment thread java/ql/src/Telemetry/AutomodelSharedCharacteristics.qll Fixed
Comment thread java/ql/src/Telemetry/AutomodelSharedCharacteristics.qll Fixed
@kaeluka kaeluka force-pushed the kaeluka/parameter-candidate-extraction branch from afaa8c5 to adcf4a3 Compare April 27, 2023 12:48
@kaeluka kaeluka changed the title Java: ExtractAutomodelCandidates.ql query Java: Automodel Framework Mode Extraction Queries Apr 27, 2023
@kaeluka kaeluka marked this pull request as ready for review April 28, 2023 07:37
@kaeluka
Copy link
Copy Markdown
Author

kaeluka commented May 10, 2023

Jean, I've renamed the query files

Comment thread java/ql/src/Telemetry/AutomodelSharedUtil.qll Fixed
Comment thread java/ql/src/Telemetry/AutomodelFrameworkModeCharacteristics.qll
@kaeluka
Copy link
Copy Markdown
Author

kaeluka commented May 11, 2023

I ran a DCA experiment, and the metadata is passed correctly! See backref.

@atorralba, I think this is now ready to merge.. could I get an approve? 👍

@atorralba
Copy link
Copy Markdown
Contributor

Do you mind if I merge my outstanding QLDoc suggestions first?

@kaeluka
Copy link
Copy Markdown
Author

kaeluka commented May 11, 2023

Which ones do you mean, @atorralba? Have I missed any? Sure, go ahead 👍

Comment thread java/ql/src/Telemetry/AutomodelSharedCharacteristics.qll Outdated
atorralba
atorralba previously approved these changes May 11, 2023
Comment thread java/ql/src/Telemetry/AutomodelSharedCharacteristics.qll Fixed
Comment thread java/ql/src/Telemetry/AutomodelFrameworkModeCharacteristics.qll Fixed
@kaeluka
Copy link
Copy Markdown
Author

kaeluka commented May 11, 2023

@atorralba sorry to be a bother, but ql-for-ql complained; tests were green otherwise. Could you re-approve?

Copy link
Copy Markdown
Contributor

@jhelie jhelie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@kaeluka kaeluka merged commit 510febf into main May 11, 2023
@kaeluka kaeluka deleted the kaeluka/parameter-candidate-extraction branch May 11, 2023 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants