INDEX
Explanations
proper nouns, in particular names of people or places
proper nouns, specifically names of people
New Auto-Interp
Negative Logits
eleph
-0.85
ccording
-0.84
CLASSIFIED
-0.76
laun
-0.74
behavi
-0.74
Þ
-0.72
pione
-0.70
tremend
-0.70
Reloaded
-0.70
conflic
-0.69
POSITIVE LOGITS
len
0.93
kowski
0.91
eson
0.91
zon
0.89
ley
0.87
son
0.86
axis
0.85
idge
0.85
enson
0.84
ridge
0.84
Activations Density 0.146%