INDEX
Explanations
proper names and titles, particularly related to people and their roles
New Auto-Interp
Negative Logits
yip
-0.77
ulhu
-0.70
ById
-0.67
Premium
-0.66
ishable
-0.63
nces
-0.62
vy
-0.62
oyer
-0.61
monds
-0.61
OTE
-0.60
POSITIVE LOGITS
Dud
0.74
Dos
0.73
Gaal
0.71
Garcia
0.69
Wilhelm
0.68
Luigi
0.68
Maced
0.66
Maria
0.65
Sawyer
0.65
Ferdinand
0.65
Activations Density 0.034%