Skip to content

Store all built-in languages#3811

Open
henrymercer wants to merge 9 commits intomainfrom
henrymercer/record-all-builtin-languages
Open

Store all built-in languages#3811
henrymercer wants to merge 9 commits intomainfrom
henrymercer/record-all-builtin-languages

Conversation

@henrymercer
Copy link
Copy Markdown
Contributor

While we want the CodeQL Action to work with third-party language support, having a list of all built-in languages can help us create better type-level checks to ensure that we don't miss things that we want to customize for each of our built-in languages.

For example, in the start-proxy Action, we want to make sure each built-in language is associated with a registries config. If we add a new language, we want to be reminded to update that config.

This means that when we add a new language, we will need to update the BuiltInLanguage enum (and potentially other places in the code that exhaustively enumerate BuiltInLanguage for safety reasons). This is a rare event and probably worth it from a safety and correctness perspective.

Commit-by-commit review recommended as we rename KnownLanguage to BuiltInLanguage to reflect the new motivation behind this type.

Risk assessment

For internal use only. Please select the risk level of this change:

  • Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Which use cases does this change impact?

Workflow types:

  • Advanced setup - Impacts users who have custom CodeQL workflows.
  • Managed - Impacts users with dynamic workflows (Default Setup, Code Quality, ...).

Products:

  • Code Scanning - The changes impact analyses when analysis-kinds: code-scanning.
  • Code Quality - The changes impact analyses when analysis-kinds: code-quality.
  • Other first-party - The changes impact other first-party analyses.
  • Third-party analyses - The changes affect the upload-sarif action.

Environments:

  • Dotcom - Impacts CodeQL workflows on github.com and/or GitHub Enterprise Cloud with Data Residency.
  • GHES - Impacts CodeQL workflows on GitHub Enterprise Server.

How did/will you validate this change?

  • Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).
  • End-to-end tests - I am depending on PR checks (i.e. tests in pr-checks).

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

  • Rollback - Change can only be disabled by rolling back the release or releasing a new version with a fix.

How will you know if something goes wrong after this change is released?

  • Telemetry - I rely on existing telemetry or have made changes to the telemetry.
    • Alerts - New or existing monitors will trip if something goes wrong with this change.

Are there any special considerations for merging or releasing this change?

  • No special considerations - This change can be merged at any time.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Consider adding a changelog entry for this change.
  • Confirm the readme and docs have been updated if necessary.

While we want the CodeQL Action to work with third-party language support, having a list of all built-in languages can help us create better type-level checks to ensure that we don't miss things that we want to customize for each of our built-in languages.
@henrymercer henrymercer requested a review from a team as a code owner April 10, 2026 18:15
Copilot AI review requested due to automatic review settings April 10, 2026 18:15
@github-actions github-actions bot added the size/XL May be very hard to review label Apr 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR renames the “known language” concept to “built-in language” and introduces a curated, updateable source of truth for all built-in CodeQL languages (plus aliases) to enable stricter type-level exhaustiveness checks across the Action.

Changes:

  • Replace KnownLanguage with BuiltInLanguage throughout the TypeScript codebase and tests.
  • Add src/languages/builtin.json + parseBuiltInLanguage/isBuiltInLanguage utilities, with unit tests to keep the enum and JSON synchronized.
  • Update the bundle update workflow to regenerate the built-in language list via a new script.
Show a summary per file
File Description
src/trap-caching.test.ts Update tests to use BuiltInLanguage instead of KnownLanguage.
src/tracer-config.test.ts Update tests to use BuiltInLanguage.
src/status-report.test.ts Update tests to use BuiltInLanguage.
src/start-proxy/environment.ts Switch language-specific proxy environment checks to BuiltInLanguage.
src/start-proxy/environment.test.ts Update start-proxy environment tests for BuiltInLanguage.
src/start-proxy.ts Replace KnownLanguage usage with BuiltInLanguage; remove old parseLanguage logic in favor of centralized parsing.
src/start-proxy.test.ts Update tests to use BuiltInLanguage; remove parseLanguage tests (now covered in languages module tests).
src/start-proxy-action.ts Use parseBuiltInLanguage and BuiltInLanguage for the language input.
src/overlay/index.test.ts Update overlay tests to use BuiltInLanguage.
src/languages/index.ts Introduce BuiltInLanguage, isBuiltInLanguage, and parseBuiltInLanguage, backed by builtin.json.
src/languages/index.test.ts Add unit tests for parsing, membership checks, and enum↔JSON synchronization.
src/languages/builtin.json New curated list of built-in languages and aliases.
src/languages.ts Remove the old KnownLanguage/JavaEnvVars definitions (moved to src/languages/index.ts).
src/known-language-aliases.json Remove the old alias JSON (replaced by src/languages/builtin.json).
src/init.ts Update language checks to BuiltInLanguage.
src/init.test.ts Update tests/types to BuiltInLanguage.
src/init-action.ts Update language-specific behavior checks to BuiltInLanguage.
src/dependency-caching.ts Update language-specific feature gating to BuiltInLanguage.
src/dependency-caching.test.ts Update dependency caching tests to BuiltInLanguage.
src/database-upload.test.ts Update database upload tests to BuiltInLanguage.
src/config/db-config.test.ts Update config parsing tests to BuiltInLanguage.
src/config-utils.ts Update supported-language filtering and language-specific logic to BuiltInLanguage.
src/config-utils.test.ts Update config utils tests to BuiltInLanguage.
src/codeql.test.ts Update CodeQL wrapper tests to BuiltInLanguage.
src/autobuild.ts Update autobuild ordering and language checks to BuiltInLanguage.
src/analyze.ts Update status-report typing and extraction conditionals to BuiltInLanguage.
src/analyze.test.ts Update analysis tests to BuiltInLanguage.
src/analyze-action.ts Update Go legacy workflow logic to use BuiltInLanguage.
pr-checks/sync.ts Update PR checks generator typing to BuiltInLanguage.
package.json Extend transpile script to also typecheck workflow scripts tsconfig.
.github/workflows/update-bundle.yml Replace alias-update step with built-in-language regeneration via new script.
.github/workflows/script/update-builtin-languages.ts New script to regenerate src/languages/builtin.json using the CodeQL CLI.
.github/workflows/script/tsconfig.json New tsconfig for typechecking workflow scripts (no emit).
lib/upload-sarif-action.js Generated output update (not reviewed).
lib/upload-sarif-action-post.js Generated output update (not reviewed).
lib/upload-lib.js Generated output update (not reviewed).
lib/start-proxy-action.js Generated output update (not reviewed).
lib/start-proxy-action-post.js Generated output update (not reviewed).
lib/setup-codeql-action.js Generated output update (not reviewed).
lib/resolve-environment-action.js Generated output update (not reviewed).
lib/init-action.js Generated output update (not reviewed).
lib/init-action-post.js Generated output update (not reviewed).
lib/autobuild-action.js Generated output update (not reviewed).
lib/analyze-action.js Generated output update (not reviewed).
lib/analyze-action-post.js Generated output update (not reviewed).

Copilot's findings

  • Files reviewed: 33/45 changed files
  • Comments generated: 1

@@ -0,0 +1,88 @@
#!/usr/bin/env npx tsx
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shebang #!/usr/bin/env npx tsx is not portable: most env implementations can’t take multiple arguments from a shebang line (without -S), so running this script directly would fail. Either remove the shebang (since the workflow already runs it via npx tsx ...) or switch to an env -S style shebang.

Suggested change
#!/usr/bin/env npx tsx

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +29
const resolveJson: Record<string, string[]> = JSON.parse(
execFileSync(
codeqlPath,
["resolve", "languages", "--format=json"],
{ encoding: "utf8" },
),
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically new languages start out as experimental, and as such require CODEQL_ENABLE_EXPERIMENTAL_FEATURES variable to be resolved. To be future ready, we probably need to set that here

Comment on lines +7 to +18
export enum BuiltInLanguage {
actions = "actions",
cpp = "cpp",
csharp = "csharp",
go = "go",
java = "java",
javascript = "javascript",
python = "python",
ruby = "ruby",
rust = "rust",
swift = "swift",
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's not worth it, but I wonder if there would be an easy way to have builtin.json as the source of truth without having to repeat the list here. I know it should be easy if we use a union type instead of an enum. Or we could have a const record built from builtin.json. Or we could have codegen do this for us. Anyway, I'll leave to you to decide whether it's worth it, it's true that adding new language support is a rare enough event (for now 😉)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Henry's intention with this is to explicitly have a duplicate of languages that we treat in special ways in the Action vs the complete list of languages (although the comment disagrees?).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's not worth it, but I wonder if there would be an easy way to have builtin.json as the source of truth without having to repeat the list here. I know it should be easy if we use a union type instead of an enum. Or we could have a const record built from builtin.json. Or we could have codegen do this for us.

I explored different approaches to this, but I couldn't find a simple way of using the JSON directly without losing exhaustivity checking on BuiltInLanguage. I think using a codegen library would allow us to keep exhaustivity checking, but it seems too heavyweight given the frequency at which we add new languages.

I believe Henry's intention with this is to explicitly have a duplicate of languages that we treat in special ways in the Action vs the complete list of languages (although the comment disagrees?).

This PR reflects a change of approach to store the full list of supported languages in the linked CodeQL bundle in service of better type-level checks (like the start-proxy Action checks we discussed offline).

Comment on lines +1 to +11
{
"extends": "../../../tsconfig.json",
"compilerOptions": {
"lib": ["esnext"],
"rootDir": "../../..",
"sourceMap": false,
"noEmit": true,
},
"include": ["./*.ts"],
"exclude": ["node_modules"]
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We should probably agree on where the TypeScript scripts used in CI live going forward. When I converted some of the scripts from .github/workflows to TypeScript I put them in pr-checks/ since I had already set up the infrastructure for TypeScript scripts there. In general, it would be good to avoid duplicating this secondary project setup across two locations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with moving this to pr-checks.


const codeqlPath = process.argv[2] || "codeql";

// Step 1: Resolve all language extractor directories.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to encapsulate the different steps in functions so that we can add tests if desired.

Comment on lines +46 to +49
console.log(` ✅ ${language}: included (default_queries: ${JSON.stringify(defaultQueries)})`);
languages.push(language);
} else {
console.log(` ❌ ${language}: excluded (no default queries)`);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell me Copilot generated this code without telling me that Copilot generated this code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'll fix the default_queries / default queries discrepancy. I'm keeping the emoji though :)

Comment on lines +72 to +80
const outputPath = path.join(
__dirname,
"..",
"..",
"..",
"src",
"languages",
"builtin.json",
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the pr-checks setup, we'd want this to be a constant in config.ts

}

/**
* Parses a built-in language name or alias.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Make it clearer in the wording that this function parses an arbitrary string into a built-in language name, resolving aliases in the process.

The parseLanguage function that's now removed had more extensive documentation:

/**
 * Parse the start-proxy language input into its canonical CodeQL language name.
 *
 * This uses the language aliases shipped with the Action and will not be able to resolve aliases
 * added by versions of the CodeQL CLI newer than the one mentioned in `defaults.json`. However,
 * this is sufficient for the start-proxy Action since we are already specifying proxy
 * configurations on a per-language basis.
 */

@henrymercer henrymercer requested review from mbg and redsun82 April 13, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL May be very hard to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants