INDEX
Explanations
references to individuals' thoughts, beliefs, and descriptions
New Auto-Interp
Negative Logits
eld
-0.16
reich
-0.16
zbek
-0.15
fahren
-0.14
ÙĩرÙĩ
-0.14
xCD
-0.14
apiro
-0.13
aver
-0.13
iran
-0.13
nie
-0.13
POSITIVE LOGITS
holm
0.16
375
0.15
hone
0.15
etur
0.15
uten
0.15
iber
0.14
Sor
0.14
ycz
0.14
ettle
0.14
iyon
0.14
Activations Density 0.217%