Static JavaScript code analysis inside a SQL query: JSHint+GitHub+BigQuery

#set UDF Source URI option to "gs://fh-bigquery/js/jshint-2.5.11.js"SELECT x error, COUNT(*) files_affected
FROM js(
(
SELECT content, sample_path, sample_repo_name
FROM [fh-bigquery:github_extracts.contents_js]
WHERE LENGTH(content) BETWEEN 1000 AND 1800
AND ABS(HASH(id))%1000=0 # sampling
),
content, sample_path, sample_repo_name,
"[
{name: 'x', type:'string'},
{name: 'sample_path', type:'string'},
{name: 'sample_repo_name', type:'string'},
{name: 'content', type:'string'}]",
"function(r, emit) {
JSHINT(r.content, {'maxdepth':2});
// data = JSHINT.data();
errors = JSHINT.errors;
set_errors=new Set(errors.map(
function(x) {
if(x && 'raw' in x) {return x.raw}}));
set_errors.forEach(function(x) {
if(!x) {return;}
emit({
x: x,
sample_repo_name: r.sample_repo_name,
sample_path: r.sample_path,
});
});
}")
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100

7.4s elapsed, 103 GB processed
  • This is some heavy weight JavaScript code — we are running a static JavaScript code analyzer inside BigQuery — and it works. That’s pretty cool.
  • I’m running this code over a sample of all JS files (see query for current filters). There’s a lot that you can do with BigQuery and SQL, and as we push the boundaries some code will run better if we work over smaller datasets. In the meantime it would be nice if there was a lighter weight JSHint equivalent.
  • I’m using JSHint 2.5.11 as newer versions fail. Ping me if you find out how to solve this.
  • The above query does not following the official BigQuery UDF supported syntax. See the docs for the correct format, but I’m using this style as it’s easier to share this way.

--

--

--

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Felipe Hoffa

Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

More from Medium

SQL Server Integration Services (SSIS) Deployment Strategies

My first python microservice with PyMS

Should I cache my Queries?

5 Best API Integration Tools for Developers in 2022 | Optymize