Conversation
While we want the CodeQL Action to work with third-party language support, having a list of all built-in languages can help us create better type-level checks to ensure that we don't miss things that we want to customize for each of our built-in languages.
There was a problem hiding this comment.
Pull request overview
This PR renames the “known language” concept to “built-in language” and introduces a curated, updateable source of truth for all built-in CodeQL languages (plus aliases) to enable stricter type-level exhaustiveness checks across the Action.
Changes:
- Replace
KnownLanguagewithBuiltInLanguagethroughout the TypeScript codebase and tests. - Add
src/languages/builtin.json+parseBuiltInLanguage/isBuiltInLanguageutilities, with unit tests to keep the enum and JSON synchronized. - Update the bundle update workflow to regenerate the built-in language list via a new script.
Show a summary per file
| File | Description |
|---|---|
| src/trap-caching.test.ts | Update tests to use BuiltInLanguage instead of KnownLanguage. |
| src/tracer-config.test.ts | Update tests to use BuiltInLanguage. |
| src/status-report.test.ts | Update tests to use BuiltInLanguage. |
| src/start-proxy/environment.ts | Switch language-specific proxy environment checks to BuiltInLanguage. |
| src/start-proxy/environment.test.ts | Update start-proxy environment tests for BuiltInLanguage. |
| src/start-proxy.ts | Replace KnownLanguage usage with BuiltInLanguage; remove old parseLanguage logic in favor of centralized parsing. |
| src/start-proxy.test.ts | Update tests to use BuiltInLanguage; remove parseLanguage tests (now covered in languages module tests). |
| src/start-proxy-action.ts | Use parseBuiltInLanguage and BuiltInLanguage for the language input. |
| src/overlay/index.test.ts | Update overlay tests to use BuiltInLanguage. |
| src/languages/index.ts | Introduce BuiltInLanguage, isBuiltInLanguage, and parseBuiltInLanguage, backed by builtin.json. |
| src/languages/index.test.ts | Add unit tests for parsing, membership checks, and enum↔JSON synchronization. |
| src/languages/builtin.json | New curated list of built-in languages and aliases. |
| src/languages.ts | Remove the old KnownLanguage/JavaEnvVars definitions (moved to src/languages/index.ts). |
| src/known-language-aliases.json | Remove the old alias JSON (replaced by src/languages/builtin.json). |
| src/init.ts | Update language checks to BuiltInLanguage. |
| src/init.test.ts | Update tests/types to BuiltInLanguage. |
| src/init-action.ts | Update language-specific behavior checks to BuiltInLanguage. |
| src/dependency-caching.ts | Update language-specific feature gating to BuiltInLanguage. |
| src/dependency-caching.test.ts | Update dependency caching tests to BuiltInLanguage. |
| src/database-upload.test.ts | Update database upload tests to BuiltInLanguage. |
| src/config/db-config.test.ts | Update config parsing tests to BuiltInLanguage. |
| src/config-utils.ts | Update supported-language filtering and language-specific logic to BuiltInLanguage. |
| src/config-utils.test.ts | Update config utils tests to BuiltInLanguage. |
| src/codeql.test.ts | Update CodeQL wrapper tests to BuiltInLanguage. |
| src/autobuild.ts | Update autobuild ordering and language checks to BuiltInLanguage. |
| src/analyze.ts | Update status-report typing and extraction conditionals to BuiltInLanguage. |
| src/analyze.test.ts | Update analysis tests to BuiltInLanguage. |
| src/analyze-action.ts | Update Go legacy workflow logic to use BuiltInLanguage. |
| pr-checks/sync.ts | Update PR checks generator typing to BuiltInLanguage. |
| package.json | Extend transpile script to also typecheck workflow scripts tsconfig. |
| .github/workflows/update-bundle.yml | Replace alias-update step with built-in-language regeneration via new script. |
| .github/workflows/script/update-builtin-languages.ts | New script to regenerate src/languages/builtin.json using the CodeQL CLI. |
| .github/workflows/script/tsconfig.json | New tsconfig for typechecking workflow scripts (no emit). |
| lib/upload-sarif-action.js | Generated output update (not reviewed). |
| lib/upload-sarif-action-post.js | Generated output update (not reviewed). |
| lib/upload-lib.js | Generated output update (not reviewed). |
| lib/start-proxy-action.js | Generated output update (not reviewed). |
| lib/start-proxy-action-post.js | Generated output update (not reviewed). |
| lib/setup-codeql-action.js | Generated output update (not reviewed). |
| lib/resolve-environment-action.js | Generated output update (not reviewed). |
| lib/init-action.js | Generated output update (not reviewed). |
| lib/init-action-post.js | Generated output update (not reviewed). |
| lib/autobuild-action.js | Generated output update (not reviewed). |
| lib/analyze-action.js | Generated output update (not reviewed). |
| lib/analyze-action-post.js | Generated output update (not reviewed). |
Copilot's findings
- Files reviewed: 33/45 changed files
- Comments generated: 1
| @@ -0,0 +1,88 @@ | |||
| #!/usr/bin/env npx tsx | |||
There was a problem hiding this comment.
The shebang #!/usr/bin/env npx tsx is not portable: most env implementations can’t take multiple arguments from a shebang line (without -S), so running this script directly would fail. Either remove the shebang (since the workflow already runs it via npx tsx ...) or switch to an env -S style shebang.
| #!/usr/bin/env npx tsx |
| const resolveJson: Record<string, string[]> = JSON.parse( | ||
| execFileSync( | ||
| codeqlPath, | ||
| ["resolve", "languages", "--format=json"], | ||
| { encoding: "utf8" }, | ||
| ), | ||
| ); |
There was a problem hiding this comment.
typically new languages start out as experimental, and as such require CODEQL_ENABLE_EXPERIMENTAL_FEATURES variable to be resolved. To be future ready, we probably need to set that here
| export enum BuiltInLanguage { | ||
| actions = "actions", | ||
| cpp = "cpp", | ||
| csharp = "csharp", | ||
| go = "go", | ||
| java = "java", | ||
| javascript = "javascript", | ||
| python = "python", | ||
| ruby = "ruby", | ||
| rust = "rust", | ||
| swift = "swift", | ||
| } |
There was a problem hiding this comment.
maybe it's not worth it, but I wonder if there would be an easy way to have builtin.json as the source of truth without having to repeat the list here. I know it should be easy if we use a union type instead of an enum. Or we could have a const record built from builtin.json. Or we could have codegen do this for us. Anyway, I'll leave to you to decide whether it's worth it, it's true that adding new language support is a rare enough event (for now 😉)
There was a problem hiding this comment.
I believe Henry's intention with this is to explicitly have a duplicate of languages that we treat in special ways in the Action vs the complete list of languages (although the comment disagrees?).
There was a problem hiding this comment.
maybe it's not worth it, but I wonder if there would be an easy way to have
builtin.jsonas the source of truth without having to repeat the list here. I know it should be easy if we use a union type instead of anenum. Or we could have a const record built frombuiltin.json. Or we could have codegen do this for us.
I explored different approaches to this, but I couldn't find a simple way of using the JSON directly without losing exhaustivity checking on BuiltInLanguage. I think using a codegen library would allow us to keep exhaustivity checking, but it seems too heavyweight given the frequency at which we add new languages.
I believe Henry's intention with this is to explicitly have a duplicate of languages that we treat in special ways in the Action vs the complete list of languages (although the comment disagrees?).
This PR reflects a change of approach to store the full list of supported languages in the linked CodeQL bundle in service of better type-level checks (like the start-proxy Action checks we discussed offline).
| { | ||
| "extends": "../../../tsconfig.json", | ||
| "compilerOptions": { | ||
| "lib": ["esnext"], | ||
| "rootDir": "../../..", | ||
| "sourceMap": false, | ||
| "noEmit": true, | ||
| }, | ||
| "include": ["./*.ts"], | ||
| "exclude": ["node_modules"] | ||
| } |
There was a problem hiding this comment.
Minor: We should probably agree on where the TypeScript scripts used in CI live going forward. When I converted some of the scripts from .github/workflows to TypeScript I put them in pr-checks/ since I had already set up the infrastructure for TypeScript scripts there. In general, it would be good to avoid duplicating this secondary project setup across two locations.
There was a problem hiding this comment.
I'm happy with moving this to pr-checks.
|
|
||
| const codeqlPath = process.argv[2] || "codeql"; | ||
|
|
||
| // Step 1: Resolve all language extractor directories. |
There was a problem hiding this comment.
It would be good to encapsulate the different steps in functions so that we can add tests if desired.
| console.log(` ✅ ${language}: included (default_queries: ${JSON.stringify(defaultQueries)})`); | ||
| languages.push(language); | ||
| } else { | ||
| console.log(` ❌ ${language}: excluded (no default queries)`); |
There was a problem hiding this comment.
Tell me Copilot generated this code without telling me that Copilot generated this code.
There was a problem hiding this comment.
Hmm, I'll fix the default_queries / default queries discrepancy. I'm keeping the emoji though :)
| const outputPath = path.join( | ||
| __dirname, | ||
| "..", | ||
| "..", | ||
| "..", | ||
| "src", | ||
| "languages", | ||
| "builtin.json", | ||
| ); |
There was a problem hiding this comment.
In the pr-checks setup, we'd want this to be a constant in config.ts
src/languages/index.ts
Outdated
| } | ||
|
|
||
| /** | ||
| * Parses a built-in language name or alias. |
There was a problem hiding this comment.
Minor: Make it clearer in the wording that this function parses an arbitrary string into a built-in language name, resolving aliases in the process.
The parseLanguage function that's now removed had more extensive documentation:
/**
* Parse the start-proxy language input into its canonical CodeQL language name.
*
* This uses the language aliases shipped with the Action and will not be able to resolve aliases
* added by versions of the CodeQL CLI newer than the one mentioned in `defaults.json`. However,
* this is sufficient for the start-proxy Action since we are already specifying proxy
* configurations on a per-language basis.
*/
Avoid new source code changing expected output
While we want the CodeQL Action to work with third-party language support, having a list of all built-in languages can help us create better type-level checks to ensure that we don't miss things that we want to customize for each of our built-in languages.
For example, in the
start-proxyAction, we want to make sure each built-in language is associated with a registries config. If we add a new language, we want to be reminded to update that config.This means that when we add a new language, we will need to update the
BuiltInLanguageenum (and potentially other places in the code that exhaustively enumerateBuiltInLanguagefor safety reasons). This is a rare event and probably worth it from a safety and correctness perspective.Commit-by-commit review recommended as we rename
KnownLanguagetoBuiltInLanguageto reflect the new motivation behind this type.Risk assessment
For internal use only. Please select the risk level of this change:
Which use cases does this change impact?
Workflow types:
dynamicworkflows (Default Setup, Code Quality, ...).Products:
analysis-kinds: code-scanning.analysis-kinds: code-quality.upload-sarifaction.Environments:
github.comand/or GitHub Enterprise Cloud with Data Residency.How did/will you validate this change?
.test.tsfiles).pr-checks).If something goes wrong after this change is released, what are the mitigation and rollback strategies?
How will you know if something goes wrong after this change is released?
Are there any special considerations for merging or releasing this change?
Merge / deployment checklist