INDEX
Explanations
proper nouns, particularly names of individuals or locations
names of individuals and specific terms related to entities or events
New Auto-Interp
Negative Logits
encing
-0.83
encers
-0.76
ively
-0.75
isions
-0.70
iring
-0.70
apers
-0.69
ision
-0.69
iple
-0.68
ires
-0.68
arily
-0.68
POSITIVE LOGITS
ishop
0.84
wana
0.82
keley
0.81
halla
0.80
hari
0.79
ruary
0.75
achelor
0.75
EGIN
0.75
reath
0.74
axter
0.73
Activations Density 0.130%