INDEX
Explanations
terms related to consequences and implications of actions or events
New Auto-Interp
Head Attr Weights
0:0.30
1:0.01
2:0.19
3:0.06
4:0.03
5:0.04
6:0.02
7:0.03
8:0.02
9:0.02
10:0.20
11:0.02
Negative Logits
malfunction
-2.30
flawless
-2.27
remod
-2.26
refurb
-2.25
reclaim
-2.23
incompet
-2.23
reclaimed
-2.22
complying
-2.22
warranty
-2.17
struggling
-2.17
POSITIVE LOGITS
ramifications
2.90
implications
2.64
Draft
2.25
consequences
2.21
lethal
2.20
hammer
2.19
GOODMAN
2.18
entially
2.14
Hor
2.14
Yiannopoulos
2.12
Activations Density 0.015%