Plot mpg vs other factors


  1. We see a blank plot as seen below when we run ggplot(data=mpg)
  1. Mpg has 234 rows and 11 columns.
  2. The drv variable describes which wheel drive the vehicle is. f= front wheel drive, r= rear wheel drive, 4= 4 wheel drive.

Since the variable Class and drv are both categorical a scatter plot does not make any sense as the points overlap with one another.

  1. The mpg data frame has 234 rows and 11 variables. The variables are:
    1. Manufacturer model – Gives model name.
    2. Displ – Gives engine displacement, in litres
    3. Year –Gives year of manufacture
    4. Cyl-Gives number of cylinders
    5. Trans- Gives type of transmission
    6. Drv – Gives type of wheel drive where f = front-wheel drive, r = rear wheel drive, 4 = 4wd
    7. Cty- Gives city miles per gallon
    8. Hwy –Gives highway miles per gallon
    9. Fl – Gives fuel type
    10. Class- Gives “type” of car
  1. Categorical – manufacturer, model, trans, drv, fl, class

Continuous – displ, cyl, cty, hwy

Categorical variables are type chr, whereas continuous variables are type dbl or int. These are written on the second line of the table.

  1. The “str” function compactly displays the internal structure of an R It gives the type of the variable namely <int>, <chr>, <dbl>etc followed by the first few observations under this variable.  Ideally, only one line for each ‘basic’ structure is displayed.

The ?mpg function on the other hand describes the dataset and what the variable names in the data set means.

  1. The points are not blue because the color argument was set within aes(), not geom_point(). Therefore when we run the code

ggplot(data = mpg) +   geom_point(mapping = aes(x = displ, y = hwy, color = “blue”))

we get the following graph

  1. The correct code to make the points appear blue is:

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = “blue”)

The outcome is the following graph: