DFFD - File: My proposed changes to how we base the % values on

Home Search Rankings Statistics About Contact

Current Time: Apr 04, 2025, 05:55:00 pm

Dwarf Fortress File Depot » Miscellaneous » My proposed changes to how we base the % values on

Registration is not required to download. However, it will allow you to vote, comment, and upload. Forgot your password?

Last Updated: Jun 15, 2014, 07:53:52 pm
First Created: Jun 15, 2014, 11:02:30 am

Author: thistleknot

Download Now

File version:		v6
For DF version:		0.34.11
Downloads:	30	Size:	355.8 KB
Views:	521 (547)	Type:	ZIP
Rating (0 votes):

Description

TLDR?

works with median values and splitting the distribution evenly around it up/down from 50%. This is done by ranking the values in their lowest to highest order.

aka

rank position / count = % value pretty much.

However, ECDF counts the same value repeated as a cumulative % of 1/count for each repeat.

Distributions that are ((count(median,distribution)/count) - .5) > .275 = skewed.

We instead, before ECDF, is remove 0's, then run remaining values through ECDF, then readd back 0's

More Info

Converting each of the values used for a role % to an ecdf value based on original base value, example, strength #, or agility #.

Distributions that are skewed (due to a large number of 0 values, such as with skills, affect the distribution.)

If you want to see what type of affect it has, look at cell J13 on 'Role Miner' sheet and set the weight for Skills to 0.

I propose a new method for increasing the value added by skewed distributions vs just a minmax conversion. I propose running an additional step of removing 0's from the list, run an ecdf conversion on the remaining elements, re-inject 0's into list.

To see the affect between these two options, check out Cell G8 on sheet 'Role Miner' and change it from 0 to 1.

http://imgur.com/9DNeIRY

The first chart is how skills were initially proposed to look like, ie max-min conversion. You see the steep jump in the blue line due to the slow rise in value of a skill using the minmax conversion method.

The second graph shows if you used the method of removing 0's from a list of a skewed distribution, and ran the remaining values through an ecdf conversion. Which has the added benefit of normalizing the data that actually have values rather than 0.

The other interesting thing, is due to skewed data being skewed on the far end of distribution, such as the right tail end (due to the majority of 0 values), the two skill drawing methods change the % added at this portion of the distribution. Hence, mess with cell G8, I highly recommend using Option 1, as it will give a better curve and boost to your values early on when any dwarf has a skill in it, works the same as the ecdf values, and I figure always working with ecdf values gives the proper order of data of equal percent distances. Otherwise skills won't have this property as the other distributions will.

Notes:

New method: increases the average of the skills, which is what I wanted. 0 remains 0, but it makes skills stand out more by increasing their average (the other averages for each distribution are near .5, skills are skewed, and could use any boost they can get once we scale them down using min-max, by doing this ecdf conversion, I raise the mean a little, but it creates a little bit of erratic comparisons. Most likely because currently dt doesn't do that for skills?

Updated the way priorities are calculated.
=1-(B7/($B$1*100))
Basically: 1 - (Rank/([Count of data sets]* 100)

Update

http://imgur.com/wmEHQXg

If a skewed distribution (such as preferences, or skills) have their skewed flag checked, and are added to another distribution that does not have the skew flag checked. The output of the results (as in the combined values of the role) are always reran through another ecdf conversion to get a 50% above 50% below rating.

ToDo:

Preference conversions.

Include attribute potential math

Include Skill rating potential

Figure out if I need to rerun all role outputs through another ecdf conversion just prior to labor optimization. If that is the case, then there is no direct comparison between roles again and only an ecdf conversion, and also remove the ecdf conversion mentioned just above in bold... as this would be done on the labor optimization end.

Ideally, we want to draw 50% of the dwarfs as good and 50% as bad, but the purpose of this was to also allow for direct comparison between roles, and it seems like I'm getting into territory again where I can't do that (because skills/preferences suppress roles).

However, non skewed distributions mean that you never need to ecdf them, even after combining them all... which means that roles that don't use preferences, or skills are not skewed and don't need this ecdf conversion as half the values will fall above 50% and below 50% naturally.

Case in point, see how j13 of 'miner role' sheet affects the mean and median listed in cell ac9 and ac10

Update:

I think any role that is developed from a combination of ecdf and/or skewed ecdf distributions, I firmly believe in running the output through an ecdf for direction comparison with other values.

This means that in the gridview, the values are drawn for everything.

Labor optimization does it's own internal normalization from roles selected, but it's basically priority adjustments to further adjust %'s since the input values have already been normalized to the best of their ability. It will just compare the ordinal ranking of the distributions between each other based on their priorities in the gridview, calculate a new priority, and multiply it against the values listed, plus the priorities applied by the role optimizations. This will help adjust the %'s a little bit more.

I learned this concept from Artificial Neural Networks and their weights. The weights are updated on backpropogation that is based on a sigmoid function that has almsot infinite variability but falls within 0 and 1 respectively. I figured I'd do something similar and create a default priority weight that assigns a small signature to the data it processes that doesn't affect it enough to raise it above or below it's neighbor ordinal equivalents (i.e. ordinal position of other labors in labor optimization), but it will have a direct +1/-1 to one value to it's neighbor. This should allow for full combination of values and to only allow values that measure rank of data within a distribution, and just that specific value has a direct ordinal ranking order to it's neighbor equivalents (i.e. other role same ranked position will have either a +/- affect).

I based this priority on the original average of the categories being compared. For example strength average vs agility average. Would result in a 1, 2 rank position accordingly, as strength > agility in DF (average value).

Raw Data: JSON / Text

Checksum / Hash

SHA-256: c510d75061d3e3bb279984b339b4885b6090e31f8cf5bf75538f60d30dd5c28a

IP: logged

Commands