INDEX
Explanations
references to sociopolitical issues and injustices
New Auto-Interp
Negative Logits
iku
-0.15
oste
-0.14
oogle
-0.14
awai
-0.14
ekler
-0.14
idth
-0.14
eward
-0.14
iken
-0.14
ooth
-0.14
emat
-0.14
POSITIVE LOGITS
universal
0.17
universal
0.17
üle
0.16
transc
0.16
PURE
0.15
ìĥī
0.15
-cross
0.15
vore
0.14
across
0.14
Freed
0.14
Activations Density 0.181%