.\" Copyright (c) 2025 .\" Manual page for cdf(1) .\" .Dd $Mdocdate$ .Dt CDF 1 .Os .Sh NAME .Nm cdf .Nd calculate cumulative distribution function from count data .Sh SYNOPSIS .Nm cdf .Op Fl f | u .Op Fl r .Op Fl h .Op Ar file .Sh DESCRIPTION The .Nm utility computes cumulative distribution functions from input data consisting of value-count pairs. It reads data from .Ar file or standard input if no file is specified, calculates relative frequencies, and outputs probability-value pairs suitable for statistical analysis and plotting. .Pp Input data must consist of whitespace-separated pairs where the first field is the count (frequency) and the second field is the data value. Output consists of cumulative probability values followed by the corresponding data values, separated by tabs. .Pp .Sh OPTIONS .Bl -tag -width Ds .It Fl f Read input values as floats. .It Fl u Read input values as unsigned integers .It Fl r Generate the complementary cumulative distribution function (CCDF), which is P(X > x) = 1 - F(x). .It Fl h Display usage information and exit. .El .Pp If .Fl f or .Fl u is not specified, the input values will be read as signed integers by default. .Sh INPUT FORMAT Each input line must contain exactly two whitespace-separated fields: .Bd -literal -offset indent count value .Ed .Pp Where .Em count is a positive integer representing the frequency of occurrence, and .Em value is the data point. .Pp Example input: .Bd -literal -offset indent 15 1.25 23 2.30 8 3.75 .Ed .Pp This format was selected to be compatible with the output of the uniq -c command. .Sh EXIT STATUS .Ex -std .Sh EXAMPLES Calculating the CDF of integer data in a file: .Bd -literal -offset indent $ cdf data.txt 0.300000000000000 10 0.650000000000000 15 1.000000000000000 20 .Ed .Pp Calculating the complementary CDF as part of a pipeline: .Bd -literal -offset indent $ awk '{print $2, $1}' measurements.dat | cdf -r -f 1.000000000000000 1.234000 0.750000000000000 2.567000 0.400000000000000 4.890000 .Ed .Pp Using standard tools to preprocess a raw list of measurements into a CDF: .Bd -literal -offset indent $ sort -n data.txt | uniq -c | cdf > dist.cdf .Ed .Sh SEE ALSO .Xr sort 1 , .Xr uniq 1 , .Xr gnuplot 1 .Sh AUTHORS .An Douglas B. Rumbaugh .Mt "dbrumbaugh@harrisburgu.edu" .Sh BUGS The program must materialize the full file in order to calculate the frequency table. It currently does this in memory, and so very large datasets may lead to crashes due to memory allocation failures when RAM is limited.