INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ils
-0.81
enced
-0.75
ital
-0.75
upp
-0.74
isco
-0.73
eport
-0.72
ution
-0.70
isites
-0.68
ence
-0.68
lav
-0.67
POSITIVE LOGITS
Gamble
0.69
Phi
0.62
Num
0.58
counted
0.58
bruising
0.57
rogens
0.57
Slayer
0.56
Heidi
0.56
Newsletter
0.56
bott
0.56
Activations Density 0.000%
No Known Activations
This feature has no known activations.