INDEX
Explanations
mentions of colleagues and themes related to colonialism
New Auto-Interp
Negative Logits
lings
-0.16
kova
-0.16
ultur
-0.15
ris
-0.15
ald
-0.15
ouch
-0.15
wald
-0.15
Hao
-0.15
illance
-0.14
elerik
-0.14
POSITIVE LOGITS
ombo
0.22
onna
0.19
/un
0.19
iseum
0.18
cy
0.18
apsed
0.17
worth
0.16
pedia
0.15
gate
0.15
Carm
0.15
Activations Density 0.051%