Kickstarting R - Frequencies

Displaying the frequencies of vectors of integers or factors

Frequencies are almost mandatory in the initial stage of examining data. freq() allows you to produce simple but easily understood tables of frequencies for variables with a small number of integral values.

Because freq() depends upon tabulate(), there are a number of workarounds in the function. Most of these are due to the way that a vector of integers is translated into a factor object. Numeric factor objects are formed by simply using numbers starting with 1 as levels. This means that if zeros are the first level in a factor object, the count of zeros will be labeled "1", the next value "2" and so on. Values that aren't present will be skipped. So if you have a bunch of perfectly valid zeros in your data and a category or two with no observations, you will get a result that looks markedly different from what you might expect. Worse still, you might get a result that looks meaningful, but doesn't represent the data in the way you think it does.

If category labels are supplied to freq(), it uses them, but otherwise it will try to use the entire range of values, displaying zero observations where necessary. Zeros only seem to be a problem when they are the first category, so the trick of adding 1 to the values is only applied in this case. Finally, tabulate() doesn't like NAs, so they have to be counted separately, removed, then added to the frequency counts after tabulate() has done its job.

Finally, freq() will display category percentages if they are requested. freq() is a good introduction to getting R to display output in a form that can be presented to those used to the vanilla stats report.

Back to Table of Contents