INDEX
Explanations
names of individuals, possibly with particular initials (such as W.F.) and surnames
references to notable individuals or historical figures
New Auto-Interp
Negative Logits
culosis
-0.69
vous
-0.66
ufact
-0.66
TVs
-0.65
alore
-0.64
basketball
-0.62
cffff
-0.62
incial
-0.61
Pg
-0.61
duino
-0.61
POSITIVE LOGITS
stad
0.85
aley
0.83
Roberts
0.76
burn
0.73
gren
0.73
mann
0.70
hyde
0.70
enstein
0.69
III
0.67
agos
0.67
Activations Density 0.200%