INDEX
Explanations
references to slavery and related terms
New Auto-Interp
Negative Logits
[to
-0.15
istory
-0.15
erif
-0.15
crow
-0.14
ismus
-0.14
esser
-0.14
tick
-0.14
ÏĥÏĦε
-0.14
tega
-0.13
اÙĦÙĤ
-0.13
POSITIVE LOGITS
enko
0.17
owl
0.16
ãĥĥãĤ·ãĥ¥
0.15
acia
0.15
Shack
0.15
geber
0.15
rypt
0.14
orum
0.14
utor
0.14
******************************************************************************↵
0.14
Activations Density 0.010%