INDEX
Explanations
proper nouns or names
the character "K."
New Auto-Interp
Negative Logits
understatement
-0.60
Romanian
-0.57
agric
-0.57
stat
-0.56
summ
-0.56
stuffing
-0.56
nort
-0.55
overw
-0.55
actionGroup
-0.55
defic
-0.55
POSITIVE LOGITS
K
3.51
KS
2.32
k
2.19
KI
2.09
KK
2.06
KR
2.06
KA
1.95
KER
1.89
KT
1.89
K
1.88
Activations Density 0.027%