Input/Output, String manipulation and 'plyr' package in R
✔This is initiated by creating a dataset in a text file.‘read.table’ function can be used to read a table from a file in R.‘file. choose’ opens a dialogue box which allows choosing the text file which was created earlier. ‘header=TRUE’ indicates that the first row consists of column names. ‘sep=","’ code separates columns in the table by commas.
> #Imports dataset into R and assigns as ‘x’
> x=read.table(file.choose(),header=TRUE,sep=",")
> x
Name Age Sex Grade
1 Raul 25
Male 80
2 Booker 18
Male 83
3 Lauri 21 Female
90
4 Leonie 21 Female
91
5 Sherlyn 22 Female
85
6 Mikaela 20 Female
69
7 Raphael 23
Male 91
8 Aiko 24 Female
97
9 Tiffaney 21 Female
78
10 Corina 23 Female
81
11 Petronila 23 Female 98
12 Alecia 20 Female
87
13 Shemika 23 Female
97
14 Fallon 22 Female
90
15 Deloris 21 Female
67
16 Randee 23 Female
91
17 Eboni 20 Female
84
18 Delfina 19 Female
93
19 Ernestina 19 Female 93
20 Milo 19
Male 67
✔‘Plyr’
package allows to Split data, run functions on that split section, and combine it with the original data set. The syntax is the same across all its
functions, instead of loading 5 packages for each data type. This makes the 'plyr' package the best choice.
> #installs 'plyr' package from CRAN
> install.packages("plyr")
> #loads package 'plyr'
> library(plyr)
✔Using the following code, the table 'x' is splitted using the 'ddply()' function from the "plyr" package. Subsets are evaluated using the variable 'Sex'. Results are transformed (grade average for each sex category) and assigned to a new column, and a new table is generated using the variable ‘y’. This gives the mean for females as 86.9375 and mean for males as 80.2500.
> #run plyer generates for the mean of both Age
and Grade split by gender
> y = ddply(x,"Sex",transform,
Grade.Average=mean(Grade))
> y
Name Age Sex Grade Grade.Average
1 Lauri 21 Female
90 86.9375
2 Leonie 21 Female
91 86.9375
3 Sherlyn 22 Female
85 86.9375
4 Mikaela 20 Female
69 86.9375
5 Aiko 24 Female
97 86.9375
6 Tiffaney 21 Female
78 86.9375
7 Corina 23 Female
81 86.9375
8 Petronila 23 Female
98 86.9375
9 Alecia 20 Female
87 86.9375
10 Shemika 23 Female
97 86.9375
11 Fallon 22 Female
90 86.9375
12 Deloris 21 Female
67 86.9375
13 Randee 23 Female
91 86.9375
14 Eboni 20 Female
84 86.9375
15 Delfina 19 Female
93 86.9375
16 Ernestina 19 Female 93
86.9375
17 Raul 25
Male 80 80.2500
18 Booker 18
Male 83 80.2500
19 Raphael 23
Male 91 80.2500
20 Milo 19
Male 67 80.2500
✔‘write.table’ function is used to
write the table ‘y’ to file ‘Sorted_Average’.
> #Print this to a file
> write.table(y,"Sorted_Average")
✔‘sep=","’ code
separates columns in the table by commas.
> #Generate a CSV(comma-separated values)
> write.table(y,"Sorted_Average",sep=",")
✔The following code creates a new table ‘newx’ that consists of rows of ‘x’ that have names with the letter ‘I’ or ‘i’.‘subset()’ function is used to get the rows and
columns from the data frame. ‘grepl’ function
tests whether the pattern is found in each name.
> #Filter the names in the given list that contain
the letter i or I.
> newx = subset(x,grepl("[iI]",x$Name))
✔‘write.table’ function is used to
write the table ‘newx’ to file ‘DataSubset’
> #writes this subset to a file
> write.table(newx,"DataSubset",sep=",")
In this way, by implementing these codes we can import data and write tables to files. In addition, we can filter the data and also get the grade average from data based on requirements utilizing ‘plyr’ package.
URL to git repo:https://github.com/VedaVangala/vedas-r-repo/tree/main/R8
References
Wickham, H. (2015). R Packages. Chapters, 2-5

Comments
Post a Comment