INDEX
Explanations
keywords related to personal names and titles, especially when used in formal contexts
proper nouns and names of people
New Auto-Interp
Negative Logits
formance
-0.78
inar
-0.77
ials
-0.71
versions
-0.69
oids
-0.68
emort
-0.68
itized
-0.68
ils
-0.66
rogens
-0.66
iren
-0.65
POSITIVE LOGITS
NYU
0.79
University
0.77
UCLA
0.77
McGill
0.76
University
0.75
Harvard
0.75
Managing
0.74
Duke
0.72
Stanford
0.72
Nort
0.71
Activations Density 0.888%