INDEX
Explanations
references to colonialism and related concepts
New Auto-Interp
Negative Logits
udy
-0.17
ouver
-0.16
aly
-0.15
repid
-0.14
rias
-0.14
reds
-0.14
óm
-0.14
lik
-0.14
conomy
-0.14
Elect
-0.14
POSITIVE LOGITS
inch
0.17
TMPro
0.16
خاÙĨÙĩ
0.14
-era
0.14
tü
0.14
ors
0.14
cratch
0.14
Farrell
0.13
ìĭ¬
0.13
ãģĤãĤĬ
0.13
Activations Density 0.030%