INDEX
Explanations
phrases describing tendencies or inclinations
New Auto-Interp
Negative Logits
zbek
-0.68
yz
-0.63
gur
-0.62
Polo
-0.62
loo
-0.55
pelling
-0.54
terday
-0.53
ZA
-0.52
Slate
-0.52
fts
-0.52
POSITIVE LOGITS
rils
1.33
entious
1.24
toward
1.14
towards
1.07
to
1.01
ril
0.93
entimes
0.90
erest
0.84
ered
0.79
erers
0.75
Activations Density 0.035%