INDEX
Explanations
proper names
mentions of a specific individual named Ke
New Auto-Interp
Negative Logits
etheless
-0.82
ashtra
-0.75
schild
-0.74
tremend
-0.70
rition
-0.70
sburgh
-0.69
åħī
-0.69
raints
-0.68
ancial
-0.68
guiActiveUnfocused
-0.68
POSITIVE LOGITS
ptic
1.03
Ke
1.03
SPA
0.99
eling
0.98
lem
0.95
cker
0.95
gan
0.95
aton
0.94
isha
0.92
bye
0.92
Activations Density 0.004%