The 2018 World Cup Visualized: All the Goals So Far

Check out all the goals so far with this Data Studio interactive visualization. Data extracted using BigQuery. keep reading to learn more.

Each dot shows where the player was when they scored.

On the left: Own-goals (when a team accidentally kicks the ball into their own goal) are attributed to the team that auto-inflicted them.

On the right: Interactive controls to visualize only the teams you care about.

Check out Eric Schmidt’s Analyzing the World Cup using Google Cloud series of posts for even more info on this.

And a follow up, also complete with an interactive dashboard to check the most valuable players so far:

Counter fact-checking

Reddit wasn’t sure about these results. So I counter-fact-checked them.

All the Brazilian goals

All the Brazilian goals, first round. Pretty good accuracy, IMHO

How I Built this Visualization

Data acquisition

This chart is powered with the same source we used to predict the 2014 World Cup results. We can’t re-share the raw data, but contact Opta Sports if you’d like to license the feed from them.

Extracting the goals

Note that:

  • Joins to get the team names out of team ids were a little tricky.
  • Own-goals are attributed to the team that dealt the self-damage, not to the ones that the goals count for.

Here’s the query:

#standardSQLWITH team_names AS (
SELECT team_id, MIN(name) name
FROM (
SELECT away_team_id team_id, away_team_name name
FROM `cloude-sandbox.galacticos.games`
WHERE competition_id = 4
UNION ALL
SELECT home_team_id team_id, home_team_name name
FROM `cloude-sandbox.galacticos.games`
WHERE competition_id = 4
)
GROUP BY 1
)
SELECT a.* EXCEPT(game_id, player_id), b.name versus
, (SELECT MAX(name) FROM `cloude-sandbox.galacticos.sqauds` WHERE player_id = a.player_id) player
FROM (
SELECT id, DATE(event_timestamp) date, x, y, min minute
, (SELECT name FROM team_names WHERE team_id=a.team_id) team
, game_id, CONCAT('p', CAST(player_id AS STRING)) player_id
FROM `cloude-sandbox.galacticos.events` a
WHERE competition_id = 4
AND type_id IN (16)
AND DATE(event_timestamp) > '2018-01-01'
) a
JOIN (
SELECT id, home_team_name name FROM `cloude-sandbox.galacticos.games` UNION ALL SELECT id, away_team_name name FROM `cloude-sandbox.galacticos.games`
) b
ON a.game_id = b.id
AND a.team != b.name
ORDER BY 1, 2

Scheduling result extraction to a Google sheet

I’m using a modified Ani Lopez’s easy BigQuery scheduling sheet. Instead of saving the results to a query, I place them in the same sheet.

Connect Data Studio to the sheet

Instead of having Data Studio read results from BigQuery, I’ll point Data Studio to the previous step sheet.

Visualize

That’s the Data Studio magic :).

And for a more comprehensive Data Studio dashboard covering the cup, check Will Sherman’s.

Refresh

New goals should show up as the dataset gets updated and the query in that sheet runs.

Next steps

Check out how Max Woolf did a heatmap with all the NCAA’s shots (with data in BigQuery too).

Want more stories? Check my Medium, follow me on twitter, and subscribe to reddit.com/r/bigquery. And try BigQuery — every month you get a full terabyte of analysis for free.

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.