diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/cdf.1 | 106 |
1 files changed, 106 insertions, 0 deletions
diff --git a/doc/cdf.1 b/doc/cdf.1 new file mode 100644 index 0000000..06f3a49 --- /dev/null +++ b/doc/cdf.1 @@ -0,0 +1,106 @@ +.\" Copyright (c) 2025 +.\" Manual page for cdf(1) +.\" +.Dd $Mdocdate$ +.Dt CDF 1 +.Os +.Sh NAME +.Nm cdf +.Nd calculate cumulative distribution function from count data +.Sh SYNOPSIS +.Nm cdf +.Op Fl f | u +.Op Fl r +.Op Fl h +.Op Ar file +.Sh DESCRIPTION +The +.Nm +utility computes cumulative distribution functions from input data consisting +of value-count pairs. It reads data from +.Ar file +or standard input if no file is specified, calculates relative frequencies, +and outputs probability-value pairs suitable for statistical analysis and +plotting. +.Pp +Input data must consist of whitespace-separated pairs where the first field +is the count (frequency) and the second field is the data value. Output +consists of cumulative probability values followed by the corresponding data +values, separated by tabs. +.Pp +.Sh OPTIONS +.Bl -tag -width Ds +.It Fl f +Read input values as floats. +.It Fl u +Read input values as unsigned integers +.It Fl r +Generate the complementary cumulative distribution function (CCDF), +which is P(X > x) = 1 - F(x). +.It Fl h +Display usage information and exit. +.El +.Pp +If +.Fl f +or +.Fl u +is not specified, the input values will be read as signed integers by +default. +.Sh INPUT FORMAT +Each input line must contain exactly two whitespace-separated fields: +.Bd -literal -offset indent +count value +.Ed +.Pp +Where +.Em count +is a positive integer representing the frequency of occurrence, and +.Em value +is the data point. +.Pp +Example input: +.Bd -literal -offset indent +15 1.25 +23 2.30 +8 3.75 +.Ed +.Pp +This format was selected to be compatible with the output of the uniq -c command. +.Sh EXIT STATUS +.Ex -std +.Sh EXAMPLES +Calculate CDF from integer data in a file: +.Bd -literal -offset indent +$ cdf data.txt +0.300000000000000 10 +0.650000000000000 15 +1.000000000000000 20 +.Ed +.Pp +Generate complementary CDF in a pipeline: +.Bd -literal -offset indent +$ awk '{print $2, $1}' measurements.dat | cdf -r -f +1.000000000000000 1.234000 +0.750000000000000 2.567000 +0.400000000000000 4.890000 +.Ed +.Pp +Use standard tools for pre-processing: +.Bd -literal -offset indent +$ sort -n data.txt | uniq -c | cdf > dist.cdf +.Ed +.Sh SEE ALSO +.Xr sort 1 , +.Xr uniq 1 , +.Xr gnuplot 1 +.Sh AUTHORS +.An Douglas B. Rumbaugh +.Mt "dbrumbaugh@harrisburgu.edu" +.Sh BUGS +The program must materialize the full file in order to calculate +the frequency table. It currently does this in memory, and so very +large datasets may lead to crashes due to memory allocation failures +when RAM is limited. + + |