Input/Output, String manipulation and 'plyr' package in R

This is initiated by creating a dataset in a text file.‘read.table’ function can be used to read a table from a file in R.‘file. choose’ opens a dialogue box which allows choosing the text file which was created earlier. ‘header=TRUE’ indicates that the first row consists of column names. ‘sep=","’ code separates columns in the table by commas.

> #Imports dataset into R and assigns as ‘x’
> x=read.table(file.choose(),header=TRUE,sep=",")
> x
        Name Age    Sex Grade
1       Raul  25   Male    80
2     Booker  18   Male    83
3      Lauri  21 Female    90
4     Leonie  21 Female    91
5    Sherlyn  22 Female    85
6    Mikaela  20 Female    69
7    Raphael  23   Male    91
8       Aiko  24 Female    97
9   Tiffaney  21 Female    78
10    Corina  23 Female    81
11 Petronila  23 Female    98
12    Alecia  20 Female    87
13   Shemika  23 Female    97
14    Fallon  22 Female    90
15   Deloris  21 Female    67
16    Randee  23 Female    91
17     Eboni  20 Female    84
18   Delfina  19 Female    93
19 Ernestina  19 Female    93
20      Milo  19   Male    67

‘Plyr’ package allows to Split data, run functions on that split section, and combine it with the original data set. The syntax is the same across all its functions, instead of loading 5 packages for each data type. This makes the 'plyr' package the best choice.

> #installs 'plyr' package from CRAN
> install.packages("plyr")
> #loads package 'plyr'
> library(plyr)

Using the following code, the table 'x' is splitted using the 'ddply()' function from the "plyr" package. Subsets are evaluated using the variable 'Sex'. Results are transformed (grade average for each sex category)  and assigned to a new column, and a new table is generated using the variable ‘y’. This gives the mean for females as 86.9375 and mean for males as 80.2500.

> #run plyer generates for the mean of both Age and Grade split by gender
> y = ddply(x,"Sex",transform, Grade.Average=mean(Grade))
> y
        Name Age    Sex Grade Grade.Average
1      Lauri  21 Female    90       86.9375
2     Leonie  21 Female    91       86.9375
3    Sherlyn  22 Female    85       86.9375
4    Mikaela  20 Female    69       86.9375
5       Aiko  24 Female    97       86.9375
6   Tiffaney  21 Female    78       86.9375
7     Corina  23 Female    81       86.9375
8  Petronila  23 Female    98       86.9375
9     Alecia  20 Female    87       86.9375
10   Shemika  23 Female    97       86.9375
11    Fallon  22 Female    90       86.9375
12   Deloris  21 Female    67       86.9375
13    Randee  23 Female    91       86.9375
14     Eboni  20 Female    84       86.9375
15   Delfina  19 Female    93       86.9375
16 Ernestina  19 Female    93       86.9375
17      Raul  25   Male    80       80.2500
18    Booker  18   Male    83       80.2500
19   Raphael  23   Male    91       80.2500
20      Milo  19   Male    67       80.2500

write.table’ function is used to write the table ‘y’ to file ‘Sorted_Average’.

> #Print this to a file
> write.table(y,"Sorted_Average")

‘sep=","’ code separates columns in the table by commas.

> #Generate a CSV(comma-separated values)
> write.table(y,"Sorted_Average",sep=",")

The following code creates a new table ‘newx’ that consists of rows of ‘x’ that have names with the letter ‘I’ or ‘i’.‘subset()’ function is used to get the rows and columns from the data frame. ‘grepl’ function tests whether the pattern is found in each name.

> #Filter the names in the given list that contain the letter i or I.
> newx = subset(x,grepl("[iI]",x$Name))

✔‘write.table’ function is used to write the table ‘newx’ to file ‘DataSubset’
> #writes this subset to a file
> write.table(newx,"DataSubset",sep=",")

 


In this way, by implementing these codes we can import data and write tables to files. In addition, we can filter the data and also get the grade average from data based on requirements utilizing ‘plyr’ package.

URL to git repo:https://github.com/VedaVangala/vedas-r-repo/tree/main/R8

References

Wickham, H. (2015). R Packages. Chapters, 2-5

Comments

Popular posts from this blog

PACKAGE "ACCURACY"

Visualization of Graphics in R