INDEX
Explanations
phrases referring to specific individuals or groups
references to individuals or entities in sentences
New Auto-Interp
Negative Logits
yi
-0.75
edd
-0.74
hal
-0.72
PRESS
-0.71
DT
-0.70
pn
-0.68
yang
-0.68
MSN
-0.68
isp
-0.68
PL
-0.67
POSITIVE LOGITS
ancestors
1.09
own
1.01
sole
0.98
namesake
0.84
grandparents
0.84
predecessors
0.84
parents
0.82
grandchildren
0.81
OWN
0.79
interests
0.78
Activations Density 0.019%