INDEX
Explanations
pronouns and references to individuals, emphasizing personal agency and actions
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.06
3:0.13
4:0.19
5:0.03
6:0.14
7:0.13
8:0.07
9:0.03
10:0.06
11:0.07
Negative Logits
Thumbnails
-1.89
thumbnails
-1.53
Fantasy
-1.45
版
-1.41
══
-1.40
Offline
-1.39
Enh
-1.38
Vector
-1.38
ANC
-1.35
CLUS
-1.31
POSITIVE LOGITS
bothered
1.68
chosen
1.43
erous
1.38
bother
1.37
seeming
1.30
bothers
1.28
spends
1.28
replied
1.27
volunteered
1.27
heck
1.25
Activations Density 0.006%