INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.08
3:0.09
4:0.08
5:0.09
6:0.07
7:0.09
8:0.08
9:0.08
10:0.07
11:0.09
Negative Logits
acus
-2.27
cius
-2.27
insk
-2.23
istar
-2.03
esa
-1.99
ube
-1.98
yang
-1.96
gemony
-1.96
erest
-1.95
utan
-1.93
POSITIVE LOGITS
spoiler
1.78
KH
1.76
copied
1.69
needless
1.69
EDITION
1.68
prolific
1.61
mistaken
1.57
unreliable
1.56
HIM
1.56
miscarriage
1.55
Activations Density 0.000%
No Known Activations
This feature has no known activations.