INDEX
Explanations
numerical values resembling permits, approvals, or rankings
the presence of numeric values
New Auto-Interp
Negative Logits
istg
-0.70
lemon
-0.65
hopes
-0.63
thumbs
-0.63
bas
-0.61
retired
-0.60
Pixie
-0.58
rette
-0.57
biography
-0.57
ãĤ©
-0.56
POSITIVE LOGITS
11
3.18
12
2.28
13
2.15
10
2.12
14
2.02
15
1.96
16
1.96
17
1.95
21
1.91
22
1.86
Activations Density 0.020%