Advanced sports visualization with Pandas, Matplotlib and Seaborn

But, was he even that bad?

Let’s look at his performance during the widely criticized match against South Korea. I want to plot a heat map and a pass map to capture his performances during the 90 minutes and to evaluate the influence (whether positive or negative) he exerted on the German’s offensive side.

Let’s start with a Pass Map

We load the json file and do some basic data cleaning in Panda to get a dataset that only contains Passing Events by Mesut Özil.

What our condensed dataset looks like. You can extract more information: pass_complete_status, etc.

This dataset is extremely meaningful, i.e., one can find out that Ozil attempted as many as 95 passes with up to 7 incisive ones in the match, which was pretty impressive for an attacking midfielder, or he relayed the ball the most to Toni Kroos (19 times) and Marco Reus (18 times) during the game. For the purpose of the pass map, we only care about the starting and ending location of a pass

The code below allows us to overlay the passes as arrows onto our pitch

Looks pretty good but we can do even better. I will come back to how a little tweak can make this plot a lot more informative

Tracking active zone with a heat map

Football heatmaps are used by in-club and media analysts to illustrate the area within which a player has been present. They are effectively a smoothed out scatter plot of player locations and could be a good indicator of how effective a player is at different parts of the field. While there may be some debate as to how much they are useful (they don’t tell you if actions/movement are a good or bad thing!), they can often be very aesthetically pleasing and engaging, hence their popularity.

Now if we go back to Mesut Özil, one of the main criticisms he faces is the low amount of field coverage as we rarely see him launching in tackles or fighting for possession, thus, “low level of work rate” as they say.

But is that really the case?

Let’s plot a heat map using Seaborn on top of Matplotlib to visualize Mesut Ozil’s involvement during 90-minute of the Germany-Korea match. The syntax of the code is incredibly simple. We use a kdeplot, which will draw a kernel density estimate of the scattering points of Özil’s locations.

Wow!!! That looks very… anti-climatic. After all, what is the graph trying to tell you? I see some coordinates, and clearly these contour-looking plots does seem to indicate that Özil is more active in the area with darker color.

Can we do any better than that?

Yes, the answer is that we can combine (1) the pitch, (2) the pass map and (3) the heat map in order to have a more comprehensive views of Ozil’s performance during the game

Notice that I also color the passes differently, as the blue arrows indicate passes made in the first half, and the red arrows second half

Now we can see a more comprehensive picture of Mesut Özil’s performance during the game. A couple of observations right off the bat:

  • He covered almost exclusively the opponent’s half, so criticisms against his lack of defensive mindset are not completely unfounded. But the question is, is he expected to win 1–1 and recover the ball as a CAM?
  • He made a lot more forward and direct passes in the second half, contrasting a larger number of more conservative, backward-looking passes made in the first half. There could be two reasons: (1) there is a general sense of urgency within the Germany team in the second half (2) the introduction of Mario Gómez as a Central Forward really produced an outlet for Özil’s key passes, as we see a total of 6 passes directly into the penalty area, three times as many as he did in the first half.

What I found interesting was the heat-pass map of Timo Werner, who started out as the lone striker for the Germany team then paired up with Mario Gomez for much of the second half.

He surprisingly spent a lot of his time on the two sides, while you would expect the Central Forward to occupy the space in the 18-yard box a lot more. This partly explains the ineffectiveness of German offensive line during the game, as their forward lines (Werner, Reus, Goretzka and then Muller, Gómez) crowd up at the wings but fail to take up space in the penalty area, thus providing very little outlet for playmakers such as Özil and Kroos to direct the ball into the 18-yard box.

4. Testing your skills: The case of France long-range efforts

A friend of mine was very convinced that the key to France’s successful World Cup was their relentless attempts to break down defending lines with long-range efforts. Think about that stunning goal Benjamin Pavard scored against Argentina in the Quarter Final

Were long-range efforts such as Pavard’s stunner key to the success of France?

We can again attempt to visualize all shots from the France team to decide whether the majority of their goals come from outside or inside the box?

If I just follow the methods shown thus far, this is what I get

Shot taken by France team during the World Cup campaign

This is fine. But we can do more to make the visualization more engaging and insightful. Specifically, I made two small tweaks:

(1) Since we only focus on the shots, which are all recorded at one side of the pitch, I will draw only the right half of the pitch

(2) Since we only care about the starting points of the shots, we can toss away the arrows and only visualize shots as scatter plots where x, y are location at which the shot were attempted.

Now this looks a whole lot better. We can see right away that France attempted as many shot inside the boxes as they did outside the penalty area. Although to a certain extent, it does support the argument that France did take a lot more long-range efforts than usual, as we would expect a much lower density of shots outside the box. In any case, it does look interesting how they seems equally clinical with the short and long-range efforts.

In my Jupyter notebook, you can also find further techniques such as overlaying a density plot and including an image to your visualization. With a couple more lines of code, you can easily produce this visualization:

Enter your NameEnter your Email Address

Leave a Reply

Your email address will not be published. Required fields are marked *