INDEX
Explanations
phrases related to legal or political contexts
New Auto-Interp
Negative Logits
NB
-0.86
bg
-0.80
Alert
-0.77
helm
-0.76
another
-0.74
aido
-0.73
otte
-0.72
restling
-0.71
zilla
-0.70
cook
-0.69
POSITIVE LOGITS
greatest
1.19
majority
1.19
predominant
1.18
quickest
1.15
wealthiest
1.14
safest
1.12
simplest
1.12
easiest
1.12
vast
1.09
stakes
1.09
Activations Density 0.314%