INDEX
Explanations
statements or assertions related to opinions or beliefs
New Auto-Interp
Negative Logits
abwe
-0.20
apiro
-0.18
ç®±
-0.15
rei
-0.14
reich
-0.14
upo
-0.14
esthes
-0.14
hurst
-0.14
_rsa
-0.14
_MPI
-0.13
POSITIVE LOGITS
eland
0.17
yourselves
0.15
dum
0.14
/OR
0.14
mage
0.14
/IP
0.14
Lazar
0.14
Gunn
0.14
Nas
0.13
ืà¸Ńà¸Ķ
0.13
Activations Density 0.050%