INDEX
Explanations
conditional and hypothetical statements related to societal issues
New Auto-Interp
Negative Logits
rang
-0.17
ilor
-0.16
orate
-0.15
.pretty
-0.15
stor
-0.15
accion
-0.14
este
-0.14
alone
-0.14
bid
-0.14
nors
-0.14
POSITIVE LOGITS
maur
0.15
ocket
0.15
éĩı
0.15
irr
0.13
unny
0.13
erli
0.13
isu
0.13
odesk
0.13
indo
0.13
foregoing
0.13
Activations Density 0.093%