INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ahime
-0.80
avorite
-0.70
Choice
-0.68
hire
-0.65
AFTA
-0.65
mire
-0.62
virginity
-0.62
mercy
-0.62
emouth
-0.62
protection
-0.62
POSITIVE LOGITS
liv
0.71
spec
0.69
endas
0.63
conom
0.62
dot
0.61
trem
0.61
rep
0.61
Spec
0.60
ische
0.59
tem
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.