INDEX
Explanations
instances of immediate responses or actions regarding requests and comments
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.09
3:0.12
4:0.09
5:0.03
6:0.08
7:0.23
8:0.03
9:0.05
10:0.10
11:0.12
Negative Logits
seless
-1.68
Impro
-1.35
bragging
-1.35
Impro
-1.25
lies
-1.23
vironment
-1.23
overboard
-1.22
niche
-1.21
Whoever
-1.21
Purpose
-1.21
POSITIVE LOGITS
nor
1.81
chell
1.50
webkit
1.45
anymore
1.44
condem
1.35
confirm
1.35
>)
1.34
FINE
1.32
confirmation
1.32
ucl
1.30
Activations Density 0.005%