INDEX
Explanations
references to individuals' names in text
names or mentions of specific individuals
New Auto-Interp
Negative Logits
liest
-0.87
ansas
-0.76
drawn
-0.68
reluct
-0.68
nesota
-0.67
ccording
-0.66
coroner
-0.66
termin
-0.65
moving
-0.64
itars
-0.63
POSITIVE LOGITS
otti
1.54
ota
1.00
ucci
0.99
ón
0.93
olini
0.90
etta
0.85
ogl
0.84
orno
0.83
arella
0.81
aceae
0.79
Activations Density 0.004%