INDEX
Explanations
expressions of preference or emphasis in speech
New Auto-Interp
Negative Logits
oran
-0.17
ilor
-0.16
velt
-0.16
ales
-0.14
amos
-0.14
atories
-0.14
587
-0.14
apur
-0.14
uvre
-0.14
gress
-0.13
POSITIVE LOGITS
anch
0.15
rif
0.15
apiro
0.15
Minority
0.14
teng
0.14
енко
0.14
modest
0.13
æ·
0.13
WikiLeaks
0.13
SAT
0.13
Activations Density 0.002%