-
Notifications
You must be signed in to change notification settings - Fork 343
feat: add map contains when map used #1497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 6fed8ca The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Code ReviewCritical Issues
Important Issues
Minor Issues
Recommendation: Fix the SQL injection and feature flag logic before merge. |
E2E Test Results✅ All tests passed • 46 passed • 3 skipped • 816s
Tests ran across 4 shards in parallel. |
| exports[`renderChartConfig HAVING clause should render HAVING clause with SQL language 1`] = `"SELECT count(),severity FROM default.logs WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY severity HAVING count(*) > 100"`; | ||
| exports[`renderChartConfig HAVING clause should render HAVING clause with SQL language 1`] = `"SELECT count(),severity FROM default.logs WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY severity HAVING COUNT(*) > 100"`; | ||
|
|
||
| exports[`renderChartConfig HAVING clause should render HAVING clause with granularity and groupBy 1`] = `"SELECT count(),event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` FROM default.events WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` HAVING count(*) > 50 ORDER BY toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\`"`; | ||
| exports[`renderChartConfig HAVING clause should render HAVING clause with granularity and groupBy 1`] = `"SELECT count(),event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` FROM default.events WHERE (timestamp >= fromUnixTimestamp64Milli(1739318400000) AND timestamp <= fromUnixTimestamp64Milli(1739491200000)) GROUP BY event_type,toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\` HAVING COUNT(*) > 50 ORDER BY toStartOfInterval(toDateTime(timestamp), INTERVAL 5 minute) AS \`__hdx_time_bucket\`"`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cases never ran it through the sql parser previously, so the sql parser just capitalizes a few things. Otherwise the tests didn't change
| expect(actual.toLowerCase()).toContain( | ||
| 'avg(response_time) > 500 and count(*) > 10', | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just case changes
| }; | ||
| }; | ||
|
|
||
| const optimizeMapAccessWhere = ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Builds ast, traverses extracting each ident, adds mapContains to the ast if the proper conditions are met, and builds back into sql
| case 'column_ref': { | ||
| const ident = extractIdent(node as ColumnRef); | ||
| if (ident) { | ||
| idents.push({ doesContain: true, ident }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either a column or map, we want both since a materialized column could be a map key optimization in disguise
| // replace materialized idents with map ident | ||
| for (const curIdent of idents) { | ||
| if (curIdent.ident.type === 'column') { | ||
| const materializedMapIdent = materializedColumnToMapIdent.get( | ||
| curIdent.ident.name, | ||
| ); | ||
| if (materializedMapIdent) { | ||
| curIdent.ident = materializedMapIdent; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to add mapContains even if that map entry is materialized, which is advantageous to still use the map key index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the materialized field would be beneficial from this optimization. but good to know
| import { CustomSchemaSQLSerializerV2, SearchQueryBuilder } from '@/queryParser'; | ||
|
|
||
| const MAP_CONTAINS_OPTIMIZATION_ENABLED = | ||
| process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: better to follow the pattern like (process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED ?? 'false') === 'true'
and move it to config.ts
| const maps = new Set( | ||
| columns.filter(v => v.type.startsWith('Map')).map(v => v.name), | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perf: We can move this out of the method, and we don’t need to traverse the AST if it’s empty. Also maps is a bit ambiguous, maybe mapFieldNames ?
|
|
||
| return parser.sqlify(ast); | ||
| } catch { | ||
| // ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log errors during development for debugging purposes?
| } | ||
|
|
||
| // add map idents to AST | ||
| const addIdentToAst = (ident: SQLMapValueIdent, doesContain: boolean) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: We can probably move this function out instead of manipulating the AST directly
|
|
||
| const MAP_CONTAINS_OPTIMIZATION_ENABLED = | ||
| process.env.NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED || | ||
| process.env.NODE_ENV !== 'production'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest removing this (in case somehow NODE_ENV is not set properly in prod) and add NEXT_PUBLIC_MAP_CONTAINS_OPTIMIZATION_ENABLED to your .env.local instead
Closes HDX-3070
Adding
mapContainsallows a bloom filter index to be used to not search a granule if a key for a given map is not present in that granule. In some testing I've done it yielded 40% less rows scanned