INDEX
Explanations
names or titles related to individuals
New Auto-Interp
Negative Logits
DERR
-0.74
actionGroup
-0.68
ORK
-0.66
Lans
-0.65
Accessory
-0.65
unctions
-0.65
osterone
-0.65
PDATE
-0.64
å¾
-0.64
CHECK
-0.61
POSITIVE LOGITS
hood
1.45
nel
1.13
uscript
1.00
acles
0.99
age
0.96
nels
0.95
ages
0.92
afort
0.91
ified
0.90
oko
0.89
Activations Density 0.043%