INDEX
Explanations
expressions of self-congratulation or self-praise
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.08
3:0.15
4:0.02
5:0.03
6:0.10
7:0.11
8:0.06
9:0.17
10:0.07
11:0.11
Negative Logits
difficulty
-1.10
arrang
-1.05
repayment
-1.02
counsel
-1.00
��
-0.99
Var
-0.97
Difficulty
-0.96
quantity
-0.96
etheless
-0.96
bouts
-0.94
POSITIVE LOGITS
akura
1.16
ONY
1.15
llah
1.14
ylon
1.12
quo
1.10
ippi
1.09
orsi
1.07
Adams
1.07
Columb
1.06
anamo
1.06
Activations Density 0.002%