INDEX
Explanations
names of individuals
the presence of specific names, particularly related to well-known figures
New Auto-Interp
Negative Logits
enegger
-0.77
Dartmouth
-0.74
oster
-0.72
chie
-0.70
refere
-0.67
Islanders
-0.67
inances
-0.65
ingham
-0.64
Kling
-0.62
EEK
-0.62
POSITIVE LOGITS
Sonia
1.20
Gandhi
0.92
eus
0.88
uria
0.84
inia
0.83
":""},{"0.82
ata
0.78
OTUS
0.75
otom
0.73
Tome
0.72
Activations Density 0.012%