Skip to content

Conversation

@knudtty
Copy link
Contributor

@knudtty knudtty commented Dec 17, 2025

Closes HDX-3070

Adding mapContains allows a bloom filter index to be used to not search a granule if a key for a given map is not present in that granule. In some testing I've done it yielded 40% less rows scanned

@changeset-bot
Copy link

changeset-bot bot commented Dec 17, 2025

🦋 Changeset detected

Latest commit: 6fed8ca

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@hyperdx/common-utils Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Dec 17, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview, Comment Dec 18, 2025 11:14pm

@knudtty knudtty requested review from a team and wrn14897 and removed request for a team December 17, 2025 21:04
@claude
Copy link

claude bot commented Dec 17, 2025

Code Review

Critical Issues

  • SQL Injection vulnerability in renderChartConfig.ts:510 - Map keys are not escaped before string interpolation. Use backticks or proper SQL escaping for mapName and escape quotes in keyName

  • Feature flag logic error in renderChartConfig.ts:15-17 - || operator causes optimization to be ALWAYS enabled (truthy in non-production). Should be: process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED === 'true' || process.env.NODE_ENV !== 'production'

  • Missing null safety in renderChartConfig.ts:467 - cur.default_expression could be undefined/null for materialized columns. Add null check before .indexOf()

Important Issues

  • Inconsistent error handling in renderChartConfig.ts:378, 533 - Empty catch blocks silently swallow all errors including TypeErrors. Consider logging errors in development or catching specific parser errors only

  • Missing validation in extractIdent:319 - node.array_index[0].index.value needs type validation - should verify it's a string before using as keyName

Minor Issues

  • Tests are comprehensive and cover edge cases well
  • CTE detection logic is correct
  • Deduplication using objectHash is appropriate

Recommendation: Fix the SQL injection and feature flag logic before merge.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 17, 2025

E2E Test Results

All tests passed • 46 passed • 3 skipped • 816s

Status Count
✅ Passed 46
❌ Failed 0
⚠️ Flaky 1
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

Comment on lines -7 to +9
exports[`renderChartConfig HAVING clause should render HAVING clause with SQL language 1`] = `"SELECT count(),severity FROM default.logs WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY severity HAVING count(*) > 100"`;
exports[`renderChartConfig HAVING clause should render HAVING clause with SQL language 1`] = `"SELECT count(),severity FROM default.logs WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY severity HAVING COUNT(*) > 100"`;

exports[`renderChartConfig HAVING clause should render HAVING clause with granularity and groupBy 1`] = `"SELECT count(),event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` FROM default.events WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` HAVING count(*) > 50 ORDER BY toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\`"`;
exports[`renderChartConfig HAVING clause should render HAVING clause with granularity and groupBy 1`] = `"SELECT count(),event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` FROM default.events WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` HAVING COUNT(*) > 50 ORDER BY toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\`"`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cases never ran it through the sql parser previously, so the sql parser just capitalizes a few things. Otherwise the tests didn't change

Comment on lines +701 to +703
expect(actual.toLowerCase()).toContain(
'avg(response_time) > 500 and count(*) > 10',
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just case changes

};
};

const optimizeMapAccessWhere = ({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds ast, traverses extracting each ident, adds mapContains to the ast if the proper conditions are met, and builds back into sql

case 'column_ref': {
const ident = extractIdent(node as ColumnRef);
if (ident) {
idents.push({ doesContain: true, ident });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either a column or map, we want both since a materialized column could be a map key optimization in disguise

Comment on lines +475 to +485
// replace materialized idents with map ident
for (const curIdent of idents) {
if (curIdent.ident.type === 'column') {
const materializedMapIdent = materializedColumnToMapIdent.get(
curIdent.ident.name,
);
if (materializedMapIdent) {
curIdent.ident = materializedMapIdent;
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to add mapContains even if that map entry is materialized, which is advantageous to still use the map key index

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the materialized field would be beneficial from this optimization. but good to know

import { CustomSchemaSQLSerializerV2, SearchQueryBuilder } from '@/queryParser';

const MAP_CONTAINS_OPTIMIZATION_ENABLED =
process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED ||
Copy link
Member

@wrn14897 wrn14897 Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: better to follow the pattern like (process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED ?? 'false') === 'true'
and move it to config.ts

Comment on lines +459 to +461
const maps = new Set(
columns.filter(v => v.type.startsWith('Map')).map(v => v.name),
);
Copy link
Member

@wrn14897 wrn14897 Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perf: We can move this out of the method, and we don’t need to traverse the AST if it’s empty. Also maps is a bit ambiguous, maybe mapFieldNames ?


return parser.sqlify(ast);
} catch {
// ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log errors during development for debugging purposes?

}

// add map idents to AST
const addIdentToAst = (ident: SQLMapValueIdent, doesContain: boolean) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: We can probably move this function out instead of manipulating the AST directly


const MAP_CONTAINS_OPTIMIZATION_ENABLED =
process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED ||
process.env.NODE_ENV !== 'production';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest removing this (in case somehow NODE_ENV is not set properly in prod) and add NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED to your .env.local instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants