INDEX
Explanations
requesting or soliciting actions or information
the word "for" and its context within various phrases
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.16
3:0.06
4:0.19
5:0.06
6:0.04
7:0.01
8:0.22
9:0.06
10:0.05
11:0.02
Negative Logits
aten
-1.48
nown
-1.43
alde
-1.41
versely
-1.36
acca
-1.35
gart
-1.27
filled
-1.26
anked
-1.23
ogn
-1.23
along
-1.23
POSITIVE LOGITS
forgiveness
1.90
farewell
1.46
MSN
1.41
�
1.38
enance
1.34
pardon
1.34
こ
1.32
=-=-=-=-
1.31
consent
1.31
ITNESS
1.30
Activations Density 0.012%