## statistics -- basic statistical measurements The module contains functions for computing basic statistical measures of samples of data ### Index namespace [stat](#stat) Functions: - [mean](#mean)(invar _data_: array<@T<int|float>>) => float - [variance](#variance)(invar _data_: array<@T<int|float>>, _kind_: enum<sample,population> = $sample) => float - [variance](#variance)(invar _data_: array<@T<int|float>>, _mean_: float, _kind_: enum<sample,population> = $sample) => float - [median](#median)(_data_: array<@T<int|float>>) => float - [percentile](#percentile)(_data_: array<@T<int|float>>, _percentage_: float) => float - [mode](#mode)(invar _data_: array<@T<int|float>>) => @T - [range](#range)(invar _data_: array<@T<int|float>>) => tuple<min: @T, max: @T> - [distribution](#distribution1)(invar _data_: array<@T<int|float>>) => map<@T,int> - [distribution](#distribution2)(invar _data_: array<@T>int|float>>, _interval_: float, _start_ = 0.0) => map<int,int> - [correlation](#correlation)(invar _data1_: array<@T<int|float>>, invar _data2_: array<@T>, _coefficient_: enum<pearson,spearman>) => float - [correlation](#correlation)(invar _data1_: array<@T<int|float>>, invar _data2_: array<@T>, _coefficient_: enum<pearson,spearman>, _mean1_: float, _mean2_: float) => float - [skewness](#skewness)(invar _data_: array<@T<int|float>>) => float - [skewness](#skewness)(invar data: array<@T<int|float>>, mean: float) => float - [kurtosis](#kurtosis)(invar data: array<@T<int|float>>) => float - [kurtosis](#kurtosis)(invar _data_: array<@T<int|float>>, _mean_: float) => float ### Functions ```ruby mean(invar data: array<@T>) => float ``` Returns arithmetic mean of *data*. E[X] = Σx / N **Errors:** `Value` when *data* is empty ```ruby variance(invar data: array<@T>, kind: enum = $sample) => float variance(invar data: array<@T>, mean: float, kind: enum = $sample) => float ``` Returns variance of *data* (measure of spread) of the given *kind*. Uses *mean* if it is given. Sample: σ²[X] = Σ(x - E[X]) / (N - 1) Population: σ²[X] = Σ(x - E[X]) / N **Errors:** `Value` when *data* is empty or contains a single item ```ruby median(data: array<@T>) => float ``` Returns median (middle value) of *data* while partially sorting it. If *data* size is even, the mean of two middle values is returned ```ruby percentile(data: array<@T>, percentage: float) => float ``` Returns percentile *percentage* (the value below which the given percentage of sample values fall) of *data* while partially sorting it. *percentage* must be in range (0; 100) **Errors:** `Value` when *data* is empty, `Param` when *percentage* is invalid ```ruby mode(invar data: array<@T>) => @T ``` Returns mode (most common value) of *data* **Errors:** `Param` when *data* is empty ```ruby range(invar data: array<@T>) => tuple ``` Returns minimum and maximum value in *data* **Errors:** `Value` when *data* is empty ```ruby distribution(invar data: array<@T>) => map<@T,int> ``` Returns distribution of values in *data* in the form `value` => `frequency`, where `value` is a single unique value and `frequency` is the number of its appearances in *data* **Errors:** `Value` when *data* is empty ```ruby distribution(invar data: array<@T>, interval: float, start = 0.0) => map ``` Returns values of *data* grouped into ranges of width *interval* starting from *start*. The result is in the form `index` => `frequency` corresponding to the ranges present in the sample. `index` identifies the range, it is equal to integer number of intervals *interval* between *start* and the beginning of the particular range; the exact range boundaries are [*start* + `floor`(`index` / *interval*); *start* + `floor`(`index` / *interval*) + *interval*). `frequency` is the number of values which fall in the range. The values lesser then *start* are not included in the resulting statistics **Errors:** `Param` when *data* is empty or *interval* is zero ```ruby correlation(invar data1: array<@T>, invar data2: array<@T>, coefficient: enum) => float correlation(invar data1: array<@T>, invar data2: array<@T>, coefficient: enum, mean1: float, mean2: float) => float ``` Returns correlation *coefficient* between *data1* and *data2*. Pearson coefficient measures linear dependence, Spearman's rank coefficient measures monotonic dependence. If *mean1* and *mean2* are given, they are used for calculating Pearson coefficient. **Note:** *self* and *other* must be of equal size Pearson: r[X,Y] = E[(X - E[X])(Y - E[Y])] / σ[X]σ[Y]
Spearman's rank: ρ[X,Y] = r(Xrank, Yrank) **Errors:** `Value` when *data1* or *data2* are empty or have different size ```ruby skewness(invar data: array<@T>) => float skewness(invar data: array<@T>, mean: float) => float ``` Returns skewness (measure of asymmetry) of *data*. Uses *mean* if it is given. γ1[X] = E[((x - E[X]) / σ)^3] **Errors:** `Value` when *data* is empty ```ruby kurtosis(invar data: array<@T>) => float kurtosis(invar data: array<@T>, mean: float) => float ``` Returns kurtosis (measure of "peakedness"). Uses *mean* if it is given γ2[X] = E[((x - E[X]) / σ)^4] - 3 **Errors:** `Value` when *data* is empty