Visualization of plane crashes

Assignment 1

Based from a dataset (attached)could you provide me a python coding with the requirements below:

  1. visualise the data by:- create a histogram for dead, survived, safest plane, dangerous plane,
  2. visualist the data by: create a graph year that most plane crashes

Report must have

  1. introduction of the dataset
  2. key challenges or problems to be address using the dataset
  3. usevisualisation to analyse the data
  4. reflection on methods use for analysis

Solution

 Visualizations- Plane Crashes.py

 # coding: utf-8

# In[1]:

import pandas as pd

importnumpy as np

importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv(‘Crashes.csv’)

a=df.isnull().sum()

# ### Number of missing values compared across columns

plt.figure()

a.plot(kind=’bar’,color=’yellow’)

plt.xlabel(‘Feature Vectors’)

plt.ylabel(‘Number of missing entries’)

plt.show()

# ## Count of top 10 operating airlines that suffered due to crash, Aeroloft and US Air force has suffered the highest number of crashes

b=df.Operator.value_counts()[:10]

plt.figure()

b.plot(kind=’bar’,color=’red’)

plt.xlabel(‘Operators’)

plt.ylabel(‘Crash Count’)

plt.show()

# ## Douglas DC-3 airplane has the suffered the highest number of flight crashes

b=df.Type.value_counts()[:10]

plt.figure()

b.plot(kind=’bar’,color=’green’)

plt.xlabel(‘Type of Aircraft’)

plt.ylabel(‘Crash Count’)

plt.show()

# ## People aboard on the aircraft, data is binned for easier interpretation

# In[6]:

X=df.Aboard.dropna()

bins = [0, 5, 20,50,100,900]

group_names = [‘below 5’, ‘5-20′, ’20-50′,’50-100′,’100 above’]

categories = pd.cut(X, bins, labels=group_names)

plt.figure()

sns.countplot(categories)

plt.show()

# ## Around 3500 of the crashes lead to death of all passengers aboard

# In[7]:

df[‘Alive’]=df[‘Aboard’]-df[‘Fatalities’]

d=df.Alive.value_counts()[0:5]

plt.figure()

d.plot(kind=’bar’,color=’blue’)

plt.xlabel(‘number of people who survived the crash’)

plt.ylabel(‘Count’)

plt.show()

# ## Maximum deaths through plane crashes were recorded in the year 1972

df[‘year’]=df.Date.apply(lambda x: x.split(‘/’)[2])

e=df[[‘year’,’Fatalities’]].groupby(‘year’).sum()

e.sort_values(‘Fatalities’,inplace=True,ascending=False)

e=e[0:15]

plt.figure()

e.plot(kind=’bar’,color=’blue’)

plt.xlabel(‘Year’)

plt.ylabel(‘Deaths by crashes’)

plt.show()