INDEX
Explanations
specific names, likely representing people or significant figures in the text
New Auto-Interp
Negative Logits
bsolute
-0.16
angkan
-0.15
ÅĻÃŃzenÃŃ
-0.14
داÙĨÙĦÙĪØ¯
-0.14
esteem
-0.13
aimassage
-0.13
رخ
-0.13
luž
-0.13
lords
-0.13
ülük
-0.13
POSITIVE LOGITS
.'
0.14
-chan
0.14
Singh
0.14
.’
0.14
&A
0.13
&C
0.13
.J
0.13
—who
0.13
ately
0.13
.K
0.13
Activations Density 0.098%