File Listing: S Transform, how Dwarf Therapist v22 derives %'s | ||||||||||||||||||||
Last Updated: Oct 23, 2014, 11:23:24 pm First Created: Jul 13, 2014, 11:42:25 am Author: thistleknot
Download Now |
||||||||||||||||||||
Description
Opens with LibreOffice The attached sheet shows how a method is applied ("S Transform") to skewed data to 'normalize' it. It's basic principle is: it works by using the area of a distribution that has the least amount of disparity (between mean and median), and does a min-max transform around each (mean and median) side (ie min to mean to max; min to median to max) Example: minmax drawing = ordered drawing but on a 0 to 1 scale. http://imgur.com/5rtUH5a Any distribution with uniformly distinct values (where no single value dominates the majority of the distribution, ex... when mode exceeds 50% of values) can be normalized using this method. What do I mean by 'uniformly distinct'? A distribution is not uniformly distinct when a value represents more than 50% of the distribution. We test If a distribution's 1st Quartile = it's median, if so we call an alternative formula than s transform.* *In that case we separate the data into two parts, into values <= median, and values >median. and run a minmax basically on the >median values, and a rank-ecdf with a factor derived value applied on the <=Median values to ensure we achieve .5 mean. - Note: This method is described in another document. This spreadsheet talks about distributions that have uniformly distinct values, such as attributes and traits. "S Transform" has the behaviour of pushing the distribution to the center. the old [beta v15] method used a flat distribution curve [imagine a die has 1/6 equally flat chances on a scale from 0 to 100%], which is how rank-ecdf worked, in other words ~= (rank / count). The old method of transforming values relied on a different formula that achieved a flat distribution. What you saw above was not a flat distribution, it was a curved one on a definite scale from 0 to 100%. What it achieves is a ~50% mean, and min represents 0% and max represents ~100%. (see cdf and pdf in pic at top) How does it work? This method always ensure we scale to min and max appropriately, but if any value get's transformed more than others, it's the values inbetween median and mean. First we transform minmax from 0 to 50% from min to mean, and 50% to 100% [around/] from mean to max. Then again on the outputted data from: min to median then median to max in a similar fashion. So then we do a minmax around the .5 value. So we go from min to .5, as 0 to 50%; and .5 to max as 50% to 100%. This ensures any data will have .5 mean and be scaled from 0 to 100% while preserving more of the original meaning of data thank rank-ecdf allowed. |
||||||||||||||||||||
Checksum / Hash | ||||||||||||||||||||
SHA-256: 1c7acb465866dc5d7a48ca0d0c49b34548557ad581e3cd990c7ac0e3fcdbde68 | ||||||||||||||||||||
IP: logged Commands |
||||||||||||||||||||
More From This Author |