Descriptive Statistics

8.2.1. Correlation Tool

Figure 8-4Correlation Tool Dialog

The correlation tool calculates the pairwise Pearson correlation coefficients of the given variables. Use this tool to calculate any number of correlation coefficients at the same time. The variables for which the correlations are calculated are specified by the Input Range: entry. The input range can consist of either a single range or a comma separated list of ranges. The given range or ranges can be grouped by columns, by rows, or by areas.

If the first row or column of the given ranges, or the first field of each area contains labels, the Labels option should be selected.

Figure 8-5Some Example Data
Example 8-1Using the Correlation Tool

For example, you want to calculate the correlation between three variables, one each in columns A, B, and C. Both variables have 10 values in rows 2 to 11 with labels in row 1 (see Figure 8-5).

  1. Enter A1:B11 in the Input Range: entry by typing this directly into the entry or clicking in the entry field and then selecting that range on the sheet. In the latter case the entry will also contain the sheet name.
  2. Select the Columns radio button next to Grouped By:, since each variable is in its own column.
  3. Select the Labels option since the first row contains labels. (see Figure 8-6).
  4. Specify the output options as described above.
  5. Press the OK button.

The calculated correlations are given in a table with each column and row labeled with the names of the variables. If the names are not given in the input range, Gnumeric generates them. In our example, the correlation between the variables in column A and B, can be found in the second column and third row of the results table (see Figure 8-7).

Figure 8-6Completed Correlation Dialog
Figure 8-7Correlation Tool Output

8.2.2. Covariance Tool

Figure 8-8Covariance Tool Dialog

The covariance tool calculates the pairwise covariance coefficients of the given variables. Use this tool to calculate any number of covariance coefficients at the same time. The variables for which the covariances are calculated are specified by the Input Range: entry. The input range can consist of either a single range or a comma separated list of ranges. The given range or ranges can be grouped by columns, by rows, or by areas.

If the first row or column of the given ranges, or the first field of each area contains labels, the Labels option should be selected.

Figure 8-9Some Example Data
Example 8-2Using The Covariance Tool

For example, you want to calculate the covariance between three variables, one each in columns A, B, and C. Both variables have 10 values in rows 2 to 11 with labels in row 1 (see Figure 8-9).

  1. Enter A1:B11 in the Input Range: entry by typing this directly into the entry or clicking in the entry field and then selecting that range on the sheet. In the latter case the entry will also contain the sheet name.
  2. Select the Columns radio button next to Grouped By:, since each variable is in its own column.
  3. Select the Labels option since the first row contains labels.
  4. Specify the output options as described above.
  5. Press the OK button.

The calculated covariances are given in a table with each column and row labeled with the names of the variables. If the names are not given in the input range, Gnumeric generates them. In our example, the covariance between the variables in column A and B, can be found in the second column and third row of the results table (see Figure 8-10).

Figure 8-10Covariance Tool Output

8.2.3. Descriptive Statistics Tool

Figure 8-11Descriptive Statistics Tool Dialog

The descriptive statistics tool calculates various statistics for the given variables and a confidence interval for the population mean. The variables are specified via the Input Range: entry. The given range or list of ranges can be grouped into variables by columns, rows, or areas.

This tool can produce four different kinds of statistical data.

  • If the Summary Statistics option is selected, this tool calculates the mean, standard error, median, mode, standard deviation, sample variance, kurtosis, skewness, range, minimum, maximum, sum, and count for each variable.

  • If the Confidence Interval for the Mean option is selected, the tool calculates confidence intervals for the population mean of each variable. Specify the confidence level in the entry box. The default confidence level is 95%.

    The interval given will usually be wider than the interval obtained using the CONFIDENCE function. The CONFIDENCE function assumes that the population standard deviation is known. This tool estimates the population standard deviation using the sample standard deviation.

  • If the Kth Largest: option is selected, the tool finds the kth largest value of each of the variables. Specify k in the entry box next to the option. The default is 1.

  • If the Kth Smallest: option is selected, the tool finds the kth smallest value of each of the variables. Specify k in the entry box next to the option. The default is 1.

If the first entry for each variable contains the label, select the Labels option.

Figure 8-12Some Example Data
Example 8-3Using the Descriptive Statistics Tool

Figure 8-12 shows some example data, Figure 8-13 the selected options, and Figure 8-14 the corresponding output.

Figure 8-13The Options Page For Descriptive Statistics
Figure 8-14Descriptive Statistics Tool Output

8.2.4. Frequency Tables

Gnumeric provides two types of frequencies tables:

  • The frequency table tools is primarily useful for non-numeric data (data of nominal and ordinal level of measurement). It allows to determine frequencies for given values.
  • The histogram tool is useful for numeric data that is supposed to be classified into a certain number of intervals. These intervals can be either specified or calculated.

8.2.4.1. Frequency Tables Tool

8.2.4.1.1. Introduction

The frequency tool can be used to create frequency tables for non-numerical data. It presents this table numerically as well as graphically.

If your data are numeric and you want to accumulate whole intervals of values into frequency counts then this tool is not appropriate. In that case you may want to use the histogram table tool described in section Section 8.2.4.2 ― Histogram Tool.

Figure 8-15Frequency Tool Dialog

As shown in Figure 8-15, the frequency table dialog has four tabs. We will introduce them in sequence.

8.2.4.1.2. The Input Tab

The Input tab shown in Figure 8-15 contains the field specifying the data to be used for the histogram.

The Input Range entry contains a single range or a list of ranges, that can be grouped into variables by rows, columns, or areas.

If the first row or column of the given input ranges, or the first field of each area contains labels, the Labels option should be selected. If the input is grouped by areas and the top left cell contains a label, the other cells in the first row are being ignored.

8.2.4.1.3. The Categories Tab

The Categories tab permits the specification of a range that contains the possible values that are supposed to be counted in the input range.

The Automatic categories option is disabled since it is not yet implemented.

Figure 8-16Frequency Tool Dialog Categories Tab

8.2.4.1.4. The Graphs & Options Tab

The Graphs & Options tab allows various options to be set. In the top half of the tab you can choose whether you would like a graph to be created. If you choose to have a graph created you can specify whether you would like to see a bar chart or a column chart.

In the bottom part of the tab you can select the percentages option. This option replaces the frequency counts with percentages.

If the categories range contains repeated values, then the percentages may add up to more than 100%. If the categories range does not contain all values that occur in the input range, the percentages may sum to less than 100%.

The Use exact comparisons checkbox determines how category values and input range values are compared. If it is checked then the function EXACT is used for the comparison. If it isn't checked then simple equality is used. In this latter case, empty cells and cells containing the numerical value 0 are considered equal. As a consequence you usually want that checkbox to be selected.

Figure 8-17Frequency Tool Dialog Graphs & Options Tab

8.2.4.1.5. Frequency Tool Results

Figure 8-18Frequency Tool Results

8.2.4.2. Histogram Tool

8.2.4.2.1. Introduction

The histogram tool can be used to create histograms or frequency tables for numerical data. Using this tool you can define intervals, or “bins”. The tool determines how many data points belong to each bin and presents this number numerically as well as graphically.

If your data are non-numeric this tool is not appropriate. In that case you may want to use the frequency table tool described in section Section 8.2.4.1 ― Frequency Tables Tool.

Figure 8-19Histogram Tool Dialog

As shown in Figure 8-19, the histogram dialog has five tabs. We will introduce them in sequence.

8.2.4.2.2. The Input Tab

The Input tab shown in Figure 8-19 contains the field specifying the data to be used for the histogram.

The Input Range entry contains a single range or a list of ranges, that can be grouped into variables by rows, columns, or areas.

If the first row or column of the given input ranges, or the first field of each area contains labels, the Labels option should be selected. If the input is grouped by areas and the top left cell contains a label, the other cells in the first row are being ignored.

8.2.4.2.3. The Cutoffs Tab

Figure 8-20Histogram Tool Dialog Cutoffs Tab

The cutoffs for the histogram can either be predetermined by data contained in your workbook or calculated by the histogram tool. These cutoffs determine bins as defined by the selection on the Bins tab.

Select the Predetermined Cutoffs option to specify data on your worksheet in the Cutoff Range: entry. The values in this range will be used as cutoffs c1, c2, and so on to cn.

Select the Calculated Cutoffs option to have the cutoffs determined by the tool. Enter the desired number of cutoffs in the Number of Cutoffs entry. It is strongly recommended (but optional) that you specify the minimum and maximum cutoffs in the Minimum cutoff and Maximum cutoff entries. If the minimum or maximum cutoff is not specified, the tool will use the minimum and/or maximum of the current data.

8.2.4.2.4. The Bins Tab

Figure 8-21Histogram Tool Dialog Bins Tab

The bins tab is used to determine how the cutoffs c1, c2, and so on to cn are translated into bins. Specifically, it has to be determined whether first and/or last bins reaching from −∞ to c1 and from cn to ∞ are added and whether data points that much cutoffs exactly are included in the bin to the right or the left.

For example the option [∙,∙),[∙,∙),⋯, [∙,∙),[∙,∞) indicates that the first bin starts at the first cutoff while the last bin ends at ∞. Moreover, each cutoff value belongs to the bin on its right.

8.2.4.2.5. The Graphs & Options Tab

Figure 8-22Histogram Tool Dialog Graphs & Options Tab

The options in the graphs and options tab specify any graph to be created and modify the appearance of the histogram:

  • The No chart option causes the chart to be omitted.

  • The Bar chart option causes a bar chart to be added to the histogram. For each bin, the bar chart shows a horizontal bar indicating the frequency.

    The Column chart option causes a column chart to be added to the histogram. For each bin, the column chart shows a vertical bar indicating the frequency.

    The Histogram chart option causes a histogram chart to be added to the histogram. For each bin, the histogram chart shows a vertical bar indicating the density (that is the frequency divided by the width of the bin).

  • The Percentages option causes the frequencies to be expressed as percentages.

  • The Cumulative answers option causes a cumulative frequency table (either with counts or with pecentages) to be created.

  • The Count numbers only option determines whether only numbers are counted. If also non-numbers are counted they are first converted into numbers, usually into 0.

8.2.4.2.6. The Output Tab

The Output tab contains the standard output options and fields described in Section 8.1 ― Overview.

8.2.4.2.7. A Histogram Example

Figure 8-23Some Example Data
Example 8-4Using the Histogram Tool

For example, you want to calculate a histogram for the number of successes in several sequences of trials. The numbers of successes are recorded in column A and the cutoffs of interest in column C (see Figure 8-23).

  1. Enter A1:A31 in the Input Range: entry of the Input tab by typing this directly into the entry or clicking in the entry field and then selecting that range on the sheet. In the latter case the entry may also contain the sheet name.
  2. Since you only have one variable select the Areas or Columns radio button next to Grouped By:.
  3. Select the Labels option since the first cell of the Input Range contains a label.
  4. Enter C2:C5 in the Cutoff Range: entry of the Cutoffs tab. The Predetermined Cutoffs option will now also be selected (see Figure 8-24).
  5. In the Bins tab select the second option since we want to add two bins reaching to ∓∞ and we want to count each cutoff value in the bin to its right (see Figure 8-25).
  6. Select the Percentage option of the Graphs &Options tab to have the frequencies expressed as percentages.
  7. Select the Column Chart option of the Graphs &Options tab to have a column chart added to the histogram (see Figure 8-26).
  8. In the Output tab, specify the output options as described in Section 8.1 ― Overview.
  9. Press the OK button.

The results are shown in Figure 8-27. Note that the graph will by default appear on top of the histogram table. It usually needs to be moved in to proper position. That has already been done here.

Figure 8-24Histogram Tool: Specifying Cutoffs
Figure 8-25Histogram Tool: Specifying Bins
Figure 8-26Histogram Tool: Specifying Options
Figure 8-27Histogram Tool Output

8.2.5. Rank and Percentile Tool

Figure 8-28Rank and Percentile Tool Dialog

Use this tool to rank given data and to calculate the percentiles of each data point.

Specify the datasets to use in the Input Range: entry. The given range can be grouped into datasets by columns, by rows, or by areas.

For each dataset, the tool creates three columns in the output table:

  1. The first column gives the indices of the ordered data from largest to smallest data value.
  2. The second column gives data values corresponding to the indices in the first column.
  3. The third column indicates the percentile of the data value in the second column.

If you have labels in the first cell of each data set, select the Labels option.

Figure 8-29Some Example Data for the Rank and Percentile Tool
Example 8-5Using the Rank and Percentile Tool

Figure 8-29 shows some example data and Figure 8-30 the corresponding output.

Figure 8-30Rank and Percentile Tool Output

In the case of ties, the rank calculated by this tool differs from the value of the RANK function for the same data. This tool calculates the rank as it is normally used in Statistics: If two values are tied, the assigned rank is the average rank for those entries. For example in Figure 8-29 the two values 10 are the second and third largest values. Since they are equal each receives the rank of 2.5, the average of 2 and 3. The rank function on the other hand assigns the rank as it is normally used to determine placements. The two values 10 would therefore each receive a rank of 2.