root/trunk/README.txt

Revision 171, 3.3 kB (checked in by dsimcha, 2 years ago)

--

Line 
1 This library has no dependencies other than the latest versions of Phobos and DMD.  To build,
2 simply unpack all the files into an empty directory and do a:
3
4 dmd -O -inline -release -lib -ofdstats.lib *.d
5
6 You can also combine dstats with other libraries, etc. as you see fit.  I intend to keep the
7 build process trivial for the foreseeable future, so that dstats is as easy as possible to set
8 up and the barrier to entry is as low as possible.
9
10 Conventions of this library:
11
12 1.  A delicate balance between ease of use, flexibility and performance should be maintained. 
13 There are tons of good libraries for hardcore numerics programmers that emphasize performance above
14 all else.  There are also tons of good statistics packages for people who are basically
15 non-programmers and aren't doing large-scale analyses or analyses in the context of larger programs. 
16 The distribution seems very bimodal.  This library tries to target the middle ground and recognize
17 the principles of tradeoffs and diminishing returns with regard to performance, flexibility
18 and ease of use.
19
20 2.  Heap allocations should be minimized.  Whenever temporary space needs to be allocated internally,
21 the call stack or TempAlloc is used if possible.  This allows good multithreaded performance, which
22 matters, for example, when computing large correlation matrices or performing statistical tests
23 on every exon in the human genome.
24
25 3.  Everything should work with the lowest common denominator generic range possible.  It's
26 frustrating to have to write tons of boilerplate code just to translate data from one format into
27 another.  Also, oftentimes even if the data is in the form of an array it needs to be copied so it
28 can be reordered without the reordering being visible to the caller.  In these cases, it can be
29 copied just as easily whether the input data is in the form of an array or some other range.
30
31 4.  Throwing exceptions vs. returning NaN:  The convention here is that an exception should be
32 thrown if a primitive parameter (i.e. an int or a float) is not in the acceptable range.  This is
33 because such things can trivially be checked upfront and should not occur by accident in most cases,
34 except for the case of bugs internal to dstats.  If the errant function parameter is the dataset,
35 i.e. a range of some kind, then a NaN should be returned, because when doing large-scale analyses,
36 a few pieces of data are expected to be defective in ways that are not easy to check upfront and
37 should not halt the whole analysis.
38
39 In general, this means that dstats.distrib and dstats.gamma should throw on invalid parameters,
40 and all other modules should return a NaN.  Any other result is most likely a bug. 
41 Cases where dstats.tests calls into dstats.distrib, resulting in thrown exceptions, are
42 unfortunately too common and need to be fixed.
43
44 5.  License:  Each file contains a license header.  All modules that are exclusively written by
45 the main author (David Simcha) are licensed under the Boost license, so that pieces of them may
46 freely be incorporated into Phobos and attribution is not required for binaries.  Some modules
47 consist of code borrowed from other places and are thus required to conform to the terms of these
48 licenses.  All are under permissive (i.e. non-copyleft) open source licenses, but some may require
49 binary attribution. 
Note: See TracBrowser for help on using the browser.