INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
illac
-0.82
ebted
-0.80
oton
-0.79
regor
-0.78
Tanz
-0.76
istors
-0.71
iership
-0.70
phies
-0.69
amiya
-0.69
erella
-0.69
POSITIVE LOGITS
educ
0.82
II
0.77
SD
0.74
intend
0.68
sic
0.64
intendent
0.64
Eng
0.64
Te
0.63
ificantly
0.62
TE
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.