INDEX
Explanations
references to individuals
mentions of "person"
New Auto-Interp
Negative Logits
actionGroup
-0.75
enthal
-0.69
unctions
-0.69
è¦
-0.68
Equal
-0.68
Lions
-0.67
Lans
-0.65
Growing
-0.65
使
-0.65
DL
-0.64
POSITIVE LOGITS
hood
1.09
nel
0.91
wise
0.77
else
0.74
istics
0.72
uscript
0.71
acles
0.71
acle
0.71
aganda
0.70
else
0.70
Activations Density 0.030%