INDEX
Explanations
references to historical events and national identity
New Auto-Interp
Negative Logits
(~
-0.17
ơi
-0.15
(<
-0.15
âh
-0.14
pper
-0.14
Ãİ
-0.14
ffer
-0.14
fucked
-0.14
Lowest
-0.14
Fuck
-0.14
POSITIVE LOGITS
tonight
0.19
``
0.16
``
0.15
rede
0.15
atto
0.15
decency
0.15
(ph
0.15
ton
0.14
----↵
0.14
Governments
0.14
Activations Density 0.006%