INDEX
Explanations
comparative or quantitative terms like "relative."
New Auto-Interp
Negative Logits
eret
-0.85
enegger
-0.84
spr
-0.83
ERG
-0.83
schild
-0.81
tein
-0.81
storm
-0.79
otle
-0.77
alon
-0.76
alos
-0.76
POSITIVE LOGITS
humidity
1.09
ease
0.84
abund
0.82
newcomer
0.82
pronoun
0.78
importance
0.78
inexper
0.78
merits
0.77
anonymity
0.77
ability
0.77
Activations Density 0.031%