INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
garn
-0.69
MPG
-0.66
neut
-0.64
augmented
-0.63
dexter
-0.62
compliment
-0.61
antibiotic
-0.61
DEP
-0.60
owed
-0.59
fou
-0.59
POSITIVE LOGITS
ACTED
0.85
)).
0.75
]).
0.69
ates
0.68
STUD
0.68
:]
0.68
views
0.68
orical
0.68
kat
0.66
Ms
0.66
Activations Density 0.000%
No Known Activations
This feature has no known activations.