INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
efficients
-0.77
fman
-0.73
elman
-0.71
egu
-0.70
Quincy
-0.69
oki
-0.69
ante
-0.68
iren
-0.67
ierrez
-0.67
reau
-0.66
POSITIVE LOGITS
regon
0.69
CN
0.65
padd
0.63
strip
0.63
merge
0.62
Soccer
0.59
sport
0.59
lett
0.59
fencing
0.58
nort
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.