INDEX
Explanations
the name "Hor," which appears frequently throughout the document
New Auto-Interp
Negative Logits
rosso
-0.17
\db
-0.16
ena
-0.16
Sco
-0.15
eva
-0.15
icht
-0.15
Proud
-0.15
ardy
-0.15
ego
-0.15
oppel
-0.14
POSITIVE LOGITS
umes
0.17
onya
0.16
YLE
0.16
åĢij
0.16
affle
0.15
循
0.15
alty
0.15
ail
0.15
undle
0.15
عر
0.15
Activations Density 0.009%