INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Shant
-0.75
Mord
-0.74
spr
-0.66
Dir
-0.63
dawn
-0.63
Shap
-0.62
topia
-0.61
emer
-0.60
adulthood
-0.60
assisted
-0.60
POSITIVE LOGITS
incent
0.85
oir
0.83
econom
0.76
emouth
0.76
BIL
0.73
ewitness
0.73
ADRA
0.70
_-
0.68
olics
0.68
nesota
0.68
Activations Density 0.000%
No Known Activations
This feature has no known activations.