EDA(Exploratory Data Analysis) on English Premier League (football).

1. Introduction

2. Importing the libraries and loading the csv.

# Importing the Filesimport pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
# Import the Dataset
os.chdir("C:/Users/Abhishek/Desktop/Data Sets/premier-league")
# Load the Dataset.
masterdata=pd.read_csv("results93-18.csv")

3. As stated earlier ,We will be taking Data of only 1 season here (2003–2004 Season).

#We need to make a Subset of this data with only the specified season.data=masterdata[masterdata["Season"]=="2003-04"]

4. Exploring the File Imported.

# Check Shape of the data
data.shape
# Check Datatype of the Columns
data.info()
#Check for missing values
data.isna().sum()
# Check Details for all numeric variables
data.describe()

Observations:

  1. Count : Shows the number of values present in respective columns.
  2. Mean: Mean of all the values present in the respective columns.
  3. Std: Standard Deviation of the values present in the respective columns.
  4. Min: The minimum value in the column.
  5. 25%: Gives the 25th percentile value.
  6. 50%: Gives the 50th percentile value.
  7. 75%: Gives the 75th percentile value.
  8. Max: The maximum value in the column.

5. Check if the Data set is Balanced.

# Check if the Dataset is Balanced
data["FTR"].value_counts()

6. Univariate Analysis

6.1 Counts for FTR (Full Time Result)

sns.countplot(data=data,x="FTR",palette="winter",)
plt.xlabel("FTR",size=15,color="Black")
plt.ylabel("Count",size=15,color="Black")

Observations:

  1. Just By looking at the Counts we can get the Understanding that the Home team has Significantly more wins.
  2. Another way of looking at is Away Team is more likely to get a Draw or a loss more often

6.2 Probability Density Function(PDF)

sns.set_style("whitegrid")
a=sns.FacetGrid(data,size=4) \
.map(sns.distplot,"FTHG") \
.add_legend()
plt.xlim(0,6)
plt.xlabel("FTHG",size=15,color="Black")
plt.title("Goals Scored at FT by Home team")
plt.show()
sns.set_style("whitegrid")
sns.FacetGrid(data,size=4,) \
.map(sns.distplot,"FTAG") \
.add_legend()
plt.xlabel("FTAG",size=15,color="Black")
plt.title("Goals Scored at FT by Away team")
plt.xlim(0,6)
plt.show()

Observations:

  1. Most of the times, Both Home and Away Team scores 1 goal, Frequent scores are (1,0,2 in this sequence).Away team slightly ahead here.

6.3 Box Plots

sns.boxplot(data=data,x="FTR",y="FTHG",palette="inferno_r", \
meanline=True,showmeans=True,\
meanprops={"marker":"^","markerfacecolor":"white", "markeredgecolor":"blue","color":"White"})
plt.title("Result and Goals Scored by Home Team")
plt.xlabel("FTR",size=15,color="Black")
plt.ylabel("FTHG",size=15,color="Black")
plt.show()

sns.boxplot(data=data,x="FTR",y="FTAG",palette="inferno", \
meanline=True,showmeans=True,\
meanprops={"marker":"^","markerfacecolor":"white", "markeredgecolor":"blue","color":"White"})
plt.title("Result and Goals Scored by Away Team")
plt.xlabel("FTR",size=15,color="Black")
plt.ylabel("FTAG",size=15,color="Black")
plt.show()

Observations:

7. Bi-Variate analysis :

  1. Bivariate analysis is one of the simplest forms of quantitative analysis.

7.1 Pair Plots

sns.pairplot(data,hue="FTR")

Observations:

  • FTHG and FTAG are the values which clearly indicate who will win.So studying these 2 variables can be the best way to to predict FTR.
  • Whichever of the 2 has higher value that team wins, Which translate to the team which score more Goals at FT wins the match.Which is how football works.

8.Check If a team is Winning at Half Time does it Change at Full Time.

# Check If a team is Winning at HalfTime does it Change at Full Time.sns.countplot(data=data,x="HTR",hue="FTR")
plt.legend(edgecolor="White",facecolor="White")
plt.xlabel("HTR",size=12,color="Black")
plt.ylabel("Count",size=12,color="Black")

H : Shows the Home team leading at Half Time.

Observations:

Conclusion :

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store