INDEX
Explanations
mentions of specific individuals by name combined with a position or title
the presence of names that start with "Ade."
New Auto-Interp
Negative Logits
ipeg
-0.85
ivity
-0.80
urers
-0.76
lessness
-0.70
enegger
-0.66
orem
-0.66
sburgh
-0.66
urally
-0.66
iew
-0.66
imation
-0.66
POSITIVE LOGITS
lled
0.92
cki
0.85
llan
0.82
vice
0.79
lli
0.78
lla
0.78
hani
0.76
utic
0.73
aways
0.72
hyde
0.72
Activations Density 0.040%