INDEX
Explanations
inquiries and expressions related to permission, guidance, and requests for clarification
New Auto-Interp
Head Attr Weights
0:0.03
1:0.04
2:0.05
3:0.02
4:0.05
5:0.47
6:0.06
7:0.03
8:0.05
9:0.04
10:0.09
11:0.04
Negative Logits
sidx
-1.97
ğ
-1.90
roth
-1.85
imura
-1.81
anwhile
-1.79
atis
-1.78
ulus
-1.76
Purg
-1.76
staking
-1.74
Spiegel
-1.68
POSITIVE LOGITS
my
2.85
myself
2.67
My
2.18
MY
1.89
My
1.88
mathemat
1.87
teness
1.80
millenn
1.78
frustrations
1.73
injust
1.72
Activations Density 0.864%