INDEX
Explanations
phrases that imply responsibility or burden, particularly in a communal context
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.07
3:0.05
4:0.09
5:0.02
6:0.05
7:0.49
8:0.02
9:0.02
10:0.06
11:0.06
Negative Logits
rapport
-1.63
cius
-1.51
ntax
-1.38
mble
-1.37
399
-1.34
nery
-1.33
regulars
-1.33
reception
-1.32
commenters
-1.32
ovan
-1.32
POSITIVE LOGITS
packs
1.97
龍喚士
1.88
packs
1.78
神
1.76
sleeves
1.75
光
1.75
rollers
1.75
覚醒
1.73
safely
1.72
�
1.71
Activations Density 0.024%