INDEX
Explanations
mentions of specific names associated with individuals or institutions, particularly in an artistic or medical context
New Auto-Interp
Negative Logits
ton
-0.19
er
-0.19
teil
-0.17
eum
-0.17
teen
-0.17
tones
-0.16
gne
-0.16
angelo
-0.16
onta
-0.16
gio
-0.16
POSITIVE LOGITS
vironment
0.23
ninger
0.22
ning
0.20
jamin
0.19
eya
0.19
ITHER
0.18
GLISH
0.18
rir
0.18
kil
0.17
ault
0.17
Activations Density 0.034%