INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
0.66
↵↵
0.57
L
0.54
E
0.54
F
0.53
Sh
0.52
"
0.51
.
0.50
G
0.49
Or
0.48
POSITIVE LOGITS
0.97
<unused339>
0.93
ناول
0.93
<unused2157>
0.91
<unused1493>
0.91
<unused1887>
0.90
<unused2169>
0.89
posticis
0.89
<unused1398>
0.89
0.88
Activations Density 7.579%