INDEX
Explanations
phrases describing people, their attributes, or their relationships with others
references to individuals identified by the word "whose."
New Auto-Interp
Negative Logits
hal
-0.71
����
-0.70
enge
-0.70
yi
-0.68
edd
-0.68
CLASS
-0.67
hari
-0.66
pn
-0.65
yang
-0.65
rator
-0.65
POSITIVE LOGITS
sole
1.11
ancestors
1.06
own
1.04
whereabouts
0.91
estates
0.87
fault
0.84
entire
0.83
opinions
0.83
names
0.82
namesake
0.82
Activations Density 0.031%