INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.06
2:0.09
3:0.09
4:0.08
5:0.08
6:0.09
7:0.07
8:0.08
9:0.08
10:0.08
11:0.08
Negative Logits
arta
-1.97
��
-1.63
cy
-1.54
paralle
-1.54
oğ
-1.53
CHAT
-1.53
odes
-1.52
��
-1.50
Aad
-1.49
phones
-1.48
POSITIVE LOGITS
sacrific
1.99
foundation
1.76
generously
1.62
boldly
1.57
confessed
1.52
itiz
1.52
groundwork
1.50
vowed
1.50
slaught
1.49
sincerely
1.49
Activations Density 0.000%
No Known Activations
This feature has no known activations.