INDEX
Explanations
mentions or descriptions of individuals
references to individuals or entities, particularly with the word "whose."
New Auto-Interp
Negative Logits
yi
-0.75
hal
-0.70
DT
-0.70
DI
-0.69
âĨ
-0.69
isp
-0.68
edd
-0.67
hari
-0.67
SPA
-0.67
PL
-0.67
POSITIVE LOGITS
ancestors
1.04
own
1.03
sole
0.98
grandchildren
0.87
OWN
0.85
parents
0.82
predecessors
0.81
namesake
0.81
deepest
0.80
fault
0.79
Activations Density 0.025%