INDEX
Explanations
verbs and phrases indicating intentional or deliberate actions
New Auto-Interp
Head Attr Weights
0:0.08
1:0.03
2:0.15
3:0.08
4:0.17
5:0.11
6:0.03
7:0.02
8:0.09
9:0.11
10:0.05
11:0.03
Negative Logits
anian
-1.46
shire
-1.35
Sirius
-1.32
insula
-1.32
ilion
-1.31
yip
-1.27
LI
-1.25
Score
-1.22
cells
-1.20
asia
-1.18
POSITIVE LOGITS
deceived
1.26
unlawful
1.22
unethical
1.16
manslaughter
1.16
iazep
1.16
forbidden
1.15
gou
1.14
prohibited
1.11
pelled
1.11
harm
1.11
Activations Density 0.019%