# Performing an SQL Style “Group By” in Matlab

### Sep

#### 11

On a recent project, I found that I needed to group data in one matrix by values in another matrix, but I could not find a nice way of doing this. So, I wrote a simple Matlab function to do this sort of grouping. I am describing it here in case others run into a similar need.

Let me start off by describing some sample data. Assume we have been collecting data on the number of cars that pass a checkpoint by hour for several days. We might end up with data that looks something like this:

Day | hour | Cars |

1 | 8 | 47 |

1 | 9 | 38 |

1 | 10 | 35 |

2 | 8 | 52 |

2 | 9 | 42 |

2 | 10 | 37 |

3 | 8 | 43 |

3 | 9 | 34 |

3 | 10 | 45 |

In SQL, we might do something like:

*SELECT day,sum(cars) from data group by day* or

*SELECT hour,sum(cars) from data group by hour*

What I did was create a function, I call ‘bucket’ for gathering the data into buckets.

function [bucket,total] = bucket(buckets, values)

% Create an Array of values and buckets

A = cat(2,values, buckets);% Sort the array on the buckets

B = sortrows(A,2);

sortedbuckets=B(:,2);% Get the number of elements in the array

n=size(sortedbuckets,1);% Set distinctbuckets to be different buckets. Initially, all the same

distinctbuckets(1:n) = 1;

% Then, compare the elements of the buckets with the values immediately following

distinctbuckets(2:n) = ( sortedbuckets(1:n-1) ~= sortedbuckets(2:n) );

% cumsum does a cumulative sum. As long as the bucket values are the same, the value

% in distinctbuckets will be zero and the resulting uniquebucket stays the same.

% when they are different, the value is 1 and the uniquebuckets are incremented by 1uniquebuckets=cumsum(distinctbuckets);

% This results in an index we can use for accumarry. We store the results in total

total=accumarray(uniquebuckets',B(:,1));

% And the bucket value is the unique sorted bucket values

bucket=unique(B(:,2));

return;

This works nicely for numeric values. If you have this all as a single large matrix, you can select out the appropriate columns. For example

[day,totals] = bucket(m(:,1),m(:,3)) % the first column being the day and the third column being the cars.

[hour,totals] = bucket(m(:,2),m(:,3)) % works just as nicely for grouping by the hours.

With less than a dozen lines of actual code, you can implement a quick and efficient SQL Group BY function in Matlab.

## Recent comments

6 years 10 weeks ago

6 years 13 weeks ago

6 years 33 weeks ago

6 years 43 weeks ago

7 years 1 week ago

7 years 5 weeks ago

7 years 5 weeks ago

7 years 9 weeks ago

7 years 11 weeks ago

7 years 16 weeks ago