INDEX
Explanations
names of celebrities, prominent figures or athletes
New Auto-Interp
Negative Logits
awks
-0.68
ories
-0.66
uate
-0.66
iliate
-0.63
Els
-0.63
raq
-0.62
sequence
-0.62
Load
-0.61
rals
-0.60
olulu
-0.60
POSITIVE LOGITS
Sr
1.35
Jr
1.31
III
1.05
aka
0.94
ovich
0.91
Jr
0.87
Productions
0.86
Returns
0.86
greets
0.83
IV
0.82
Activations Density 0.292%