INDEX
Explanations
proper nouns
possessive pronouns indicating ownership or association
New Auto-Interp
Negative Logits
Blumenthal
-0.70
unker
-0.70
Hussein
-0.69
Mahm
-0.68
Ukrain
-0.64
Lerner
-0.62
Logged
-0.61
Huma
-0.61
Haf
-0.59
whereas
-0.58
POSITIVE LOGITS
own
1.62
respective
0.95
OWN
0.91
Own
0.89
customary
0.89
usual
0.88
selves
0.86
stride
0.85
entire
0.84
hometown
0.82
Activations Density 0.308%