How to use the aggregate function in R to perform computation on measures that are categorized by some variables in a data frame

In today's fast-paced world, there are tremendous amount of data being recorded periodically. These data may come from sensors which record some measurement along with some categorization such as time and sensor type.

To make sense of such data, most data analysts use the R programming language as a tool. Apart from being free, there are many nice features of R which can help make my data analysis work easier.

This post records the use of the aggregate function in R which I often use to create meaning out of the humongous data which I lay my hands on.

To remember how to use the aggregate function, I will recite the following sentence in my brain before constructing the codes:

Aggregate a_measurement_column, by a_type_column on the data with the function a_function.

With that I could easily construct the R codes similar to the following examples. These examples aggregate the miles per gallon values in the mtcars dataframe that is provided as part of the R Datasets Package.

Example to get the average of measurements that are categorized by one column

> averageMpgByCyl <- aggregate(mpg ~ cyl, data = mtcars, FUN = 'mean')
> head(averageMpgByCyl)
  cyl      mpg
1   4 26.66364
2   6 19.74286
3   8 15.10000

Example to get the average of measurements that are categorized by more than one column

> averageMpgByGearAndCyl <- aggregate(mpg ~ gear * cyl, data = mtcars, FUN = 'mean')
> head(averageMpgByGearAndCyl)
  gear cyl    mpg
1    3   4 21.500
2    4   4 26.925
3    5   4 28.200
4    3   6 19.750
5    4   6 19.750
6    5   6 19.700

Get the sum of measures that are categorized by one column

> sumMpgByCyl <- aggregate(mpg ~ cyl, data = mtcars, FUN='sum')
> head(sumMpgByGearAndCyl)
  gear cyl    mpg
1    3   4 21.500
2    4   4 26.925
3    5   4 28.200
4    3   6 19.750
5    4   6 19.750
6    5   6 19.700

Get the sum of measures that are categorized by more than one column

> sumMpgByGearAndCyl <- aggregate(mpg ~ gear * cyl, data = mtcars, FUN='sum')
> head(sumMpgByGearAndCyl)
  gear cyl   mpg
1    3   4  21.5
2    4   4 215.4
3    5   4  56.4
4    3   6  39.5
5    4   6  79.0
6    5   6  19.7

About Clivant

Clivant a.k.a Chai Heng enjoys composing software and building systems to serve people. He owns techcoil.com and hopes that whatever he had written and built so far had benefited people. All views expressed belongs to him and are not representative of the company that he works/worked for.