INDEX
Explanations
special characters and formatting in the text
New Auto-Interp
Negative Logits
("-0.21
('-0.18
"
-0.18
(“
-0.18
—
-0.17
(&
-0.17
"'
-0.17
-'
-0.16
--
-0.16
ðŁ
-0.16
POSITIVE LOGITS
duke
0.17
Indonesian
0.17
Indonesia
0.16
Duke
0.15
jab
0.15
Facts
0.15
facts
0.14
jad
0.14
okus
0.14
politic
0.14
Activations Density 0.002%