Static JavaScript code analysis inside a SQL query: JSHint+GitHub+BigQuery

Felipe Hoffa
2 min readJun 29, 2016

--

Can we run a static code analysis tool for JavaScript inside BigQuery? Yes we can.

This is a BigQuery query that runs JSHint, and reports on the most common errors in a sample of all open source JavaScript code in GitHub:

#set UDF Source URI option to "gs://fh-bigquery/js/jshint-2.5.11.js"SELECT x error, COUNT(*) files_affected
FROM js(
(
SELECT content, sample_path, sample_repo_name
FROM [fh-bigquery:github_extracts.contents_js]
WHERE LENGTH(content) BETWEEN 1000 AND 1800
AND ABS(HASH(id))%1000=0 # sampling
),
content, sample_path, sample_repo_name,
"[
{name: 'x', type:'string'},
{name: 'sample_path', type:'string'},
{name: 'sample_repo_name', type:'string'},
{name: 'content', type:'string'}]",
"function(r, emit) {
JSHINT(r.content, {'maxdepth':2});
// data = JSHINT.data();
errors = JSHINT.errors;
set_errors=new Set(errors.map(
function(x) {
if(x && 'raw' in x) {return x.raw}}));
set_errors.forEach(function(x) {
if(!x) {return;}
emit({
x: x,
sample_repo_name: r.sample_repo_name,
sample_path: r.sample_path,
});
});
}")
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100

7.4s elapsed, 103 GB processed

Some notes:

  • This is some heavy weight JavaScript code — we are running a static JavaScript code analyzer inside BigQuery — and it works. That’s pretty cool.
  • I’m running this code over a sample of all JS files (see query for current filters). There’s a lot that you can do with BigQuery and SQL, and as we push the boundaries some code will run better if we work over smaller datasets. In the meantime it would be nice if there was a lighter weight JSHint equivalent.
  • I’m using JSHint 2.5.11 as newer versions fail. Ping me if you find out how to solve this.
  • The above query does not following the official BigQuery UDF supported syntax. See the docs for the correct format, but I’m using this style as it’s easier to share this way.

More resources for GitHub on BigQuery: https://medium.com/@hoffa/b3576fd2b150

--

--

Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.