INDEX
Explanations
words related to statements or opinions
statements or assertions made by advocates or officials
New Auto-Interp
Negative Logits
Himself
-0.94
ï¸
-0.70
ascript
-0.67
à¦
-0.65
SourceFile
-0.65
Lex
-0.65
crow
-0.65
ufact
-0.64
artist
-0.63
icut
-0.63
POSITIVE LOGITS
they
0.99
majorities
0.71
there
0.70
theirs
0.69
otherwise
0.69
goodbye
0.67
it
0.64
that
0.64
alike
0.63
loopholes
0.63
Activations Density 0.156%