Assignment - Blog Post 0

In this blog post assignment (homework), I create a short post for my new website. The primary purpose is to practice working with Jekyll blogging with Python code.

Import the data

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns 
import pandas as pd
url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)
#Then, we briefly overview all the penguins
penguins
studyName Sample Number Species Region Island Stage Individual ID Clutch Completion Date Egg Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex Delta 15 N (o/oo) Delta 13 C (o/oo) Comments
0 PAL0708 1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A1 Yes 11/11/07 39.1 18.7 181.0 3750.0 MALE NaN NaN Not enough blood for isotopes.
1 PAL0708 2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A2 Yes 11/11/07 39.5 17.4 186.0 3800.0 FEMALE 8.94956 -24.69454 NaN
2 PAL0708 3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A1 Yes 11/16/07 40.3 18.0 195.0 3250.0 FEMALE 8.36821 -25.33302 NaN
3 PAL0708 4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A2 Yes 11/16/07 NaN NaN NaN NaN NaN NaN NaN Adult not sampled.
4 PAL0708 5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N3A1 Yes 11/16/07 36.7 19.3 193.0 3450.0 FEMALE 8.76651 -25.32426 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
339 PAL0910 120 Gentoo penguin (Pygoscelis papua) Anvers Biscoe Adult, 1 Egg Stage N38A2 No 12/1/09 NaN NaN NaN NaN NaN NaN NaN NaN
340 PAL0910 121 Gentoo penguin (Pygoscelis papua) Anvers Biscoe Adult, 1 Egg Stage N39A1 Yes 11/22/09 46.8 14.3 215.0 4850.0 FEMALE 8.41151 -26.13832 NaN
341 PAL0910 122 Gentoo penguin (Pygoscelis papua) Anvers Biscoe Adult, 1 Egg Stage N39A2 Yes 11/22/09 50.4 15.7 222.0 5750.0 MALE 8.30166 -26.04117 NaN
342 PAL0910 123 Gentoo penguin (Pygoscelis papua) Anvers Biscoe Adult, 1 Egg Stage N43A1 Yes 11/22/09 45.2 14.8 212.0 5200.0 FEMALE 8.24246 -26.11969 NaN
343 PAL0910 124 Gentoo penguin (Pygoscelis papua) Anvers Biscoe Adult, 1 Egg Stage N43A2 Yes 11/22/09 49.9 16.1 213.0 5400.0 MALE 8.36390 -26.15531 NaN

344 rows × 17 columns

We select the columns that might be used:

penguins = penguins[["Species",'Island',"Culmen Length (mm)", 
                     "Culmen Depth (mm)","Flipper Length (mm)",
    "Body Mass (g)", "Delta 15 N (o/oo)", "Delta 13 C (o/oo)"]]
penguins
Species Island Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Delta 15 N (o/oo) Delta 13 C (o/oo)
0 Adelie Penguin (Pygoscelis adeliae) Torgersen 39.1 18.7 181.0 3750.0 NaN NaN
1 Adelie Penguin (Pygoscelis adeliae) Torgersen 39.5 17.4 186.0 3800.0 8.94956 -24.69454
2 Adelie Penguin (Pygoscelis adeliae) Torgersen 40.3 18.0 195.0 3250.0 8.36821 -25.33302
3 Adelie Penguin (Pygoscelis adeliae) Torgersen NaN NaN NaN NaN NaN NaN
4 Adelie Penguin (Pygoscelis adeliae) Torgersen 36.7 19.3 193.0 3450.0 8.76651 -25.32426
... ... ... ... ... ... ... ... ...
339 Gentoo penguin (Pygoscelis papua) Biscoe NaN NaN NaN NaN NaN NaN
340 Gentoo penguin (Pygoscelis papua) Biscoe 46.8 14.3 215.0 4850.0 8.41151 -26.13832
341 Gentoo penguin (Pygoscelis papua) Biscoe 50.4 15.7 222.0 5750.0 8.30166 -26.04117
342 Gentoo penguin (Pygoscelis papua) Biscoe 45.2 14.8 212.0 5200.0 8.24246 -26.11969
343 Gentoo penguin (Pygoscelis papua) Biscoe 49.9 16.1 213.0 5400.0 8.36390 -26.15531

344 rows × 8 columns

Observing the data

Now, by observing that there are certain number of species and islands of penguins, it might be great to know how many different species and islands are there in the dataset. This would help in observing the characteristics of the penguins when creating further plots, and possibly observing some patterns. The unique function gives the unique elements of an array, and in this case, gives the unique elements through a column.

penguins["Island"].unique()
array(['Torgersen', 'Biscoe', 'Dream'], dtype=object)
penguins["Species"].unique()
array(['Adelie Penguin (Pygoscelis adeliae)',
       'Chinstrap penguin (Pygoscelis antarctica)',
       'Gentoo penguin (Pygoscelis papua)'], dtype=object)

Preprocess the data

We drop the Nan values with the dropna function:

penguins = penguins.dropna()
penguins
Species Island Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Delta 15 N (o/oo) Delta 13 C (o/oo)
1 Adelie Penguin (Pygoscelis adeliae) Torgersen 39.5 17.4 186.0 3800.0 8.94956 -24.69454
2 Adelie Penguin (Pygoscelis adeliae) Torgersen 40.3 18.0 195.0 3250.0 8.36821 -25.33302
4 Adelie Penguin (Pygoscelis adeliae) Torgersen 36.7 19.3 193.0 3450.0 8.76651 -25.32426
5 Adelie Penguin (Pygoscelis adeliae) Torgersen 39.3 20.6 190.0 3650.0 8.66496 -25.29805
6 Adelie Penguin (Pygoscelis adeliae) Torgersen 38.9 17.8 181.0 3625.0 9.18718 -25.21799
... ... ... ... ... ... ... ... ...
338 Gentoo penguin (Pygoscelis papua) Biscoe 47.2 13.7 214.0 4925.0 7.99184 -26.20538
340 Gentoo penguin (Pygoscelis papua) Biscoe 46.8 14.3 215.0 4850.0 8.41151 -26.13832
341 Gentoo penguin (Pygoscelis papua) Biscoe 50.4 15.7 222.0 5750.0 8.30166 -26.04117
342 Gentoo penguin (Pygoscelis papua) Biscoe 45.2 14.8 212.0 5200.0 8.24246 -26.11969
343 Gentoo penguin (Pygoscelis papua) Biscoe 49.9 16.1 213.0 5400.0 8.36390 -26.15531

330 rows × 8 columns

Creating some plots

Now we see that all penguins could be classified into three species, and they dwell three islands(the species and islands might not be corresponded). First off, it might be helpful to create some plots demonstrating whether some features of each of these species corresponds with certain islands or species.

The following plot inspects whether culmen length and culmen depth could together show some pattern for different species.

#relplot plot the relationship between two variables 
#the first two parameters are variables for relationship eval,
#the third one indicates the data to use, 
#and the last parameter 'hue' specifies which feature to use to cluster the penguins
sns.relplot("Culmen Length (mm)", "Culmen Depth (mm)", 
            data = penguins, hue = 'Island')

blogpost0_16_2.png

Oops, it seems like culmen length and culmen depth could not separate out penguins on islands. This is to say, regardless of some patterns, there are penguins for culmen length < 45mm and culmen depth > 16mm on each island.

Next, we check for the same features, but cluster the penguins by species.

sns.relplot("Culmen Length (mm)", "Culmen Depth (mm)", 
            data = penguins, hue = 'Species', size_norm = (5,5))

blogpost0_16_2.png

Surprisingly, the above figure demonstrates that each of the species has its own cluster for culmen length and depth combination.

Space Holder

Space Holder

Written on April 6, 2021