INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
smelly
0.91
ны
0.89
stabbing
0.89
panties
0.88
nerdy
0.88
flavon
0.87
humanist
0.87
dysfunctional
0.87
itchy
0.87
knowledgeable
0.85
POSITIVE LOGITS
i
1.10
H
1.03
S
1.00
X
0.98
N
0.97
W
0.96
U
0.91
B
0.91
V
0.89
D
0.88
Activations Density 0.000%
No Known Activations
This feature has no known activations.