INDEX
Explanations
words related to social issues and criticisms of authority
New Auto-Interp
Negative Logits
amu
-0.17
Dare
-0.15
anchise
-0.15
jour
-0.15
th
-0.15
tac
-0.15
PD
-0.14
hak
-0.14
PD
-0.14
ä¹Ī
-0.14
POSITIVE LOGITS
arus
0.15
otec
0.14
.Classes
0.14
šku
0.14
наÑĢод
0.14
cz
0.14
Rangers
0.14
(Table
0.14
ogh
0.14
avicon
0.14
Activations Density 0.028%