INDEX
Explanations
people's names, locations, and organizations
New Auto-Interp
Negative Logits
manship
-0.71
pload
-0.70
kins
-0.69
heads
-0.67
head
-0.65
ptive
-0.64
Ø©
-0.64
STON
-0.64
kers
-0.63
gb
-0.62
POSITIVE LOGITS
ral
1.25
ropy
1.20
rip
1.13
een
1.12
ribution
1.12
orian
1.12
rained
1.12
rum
1.11
rations
1.10
uary
1.10
Activations Density 2.498%