INDEX
Explanations
special characters or unique symbols in text
New Auto-Interp
Negative Logits
ighter
-0.16
VH
-0.15
orda
-0.15
alom
-0.14
illow
-0.14
eln
-0.14
ripper
-0.14
LH
-0.14
‘
-0.14
Hamburg
-0.13
POSITIVE LOGITS
Indigenous
0.35
Stanley
0.30
Indians
0.28
Native
0.26
Stan
0.26
colon
0.25
colonial
0.25
Colonial
0.25
Colon
0.24
Indian
0.24
Activations Density 0.003%