INDEX
Explanations
words related to rejection or denial
New Auto-Interp
Head Attr Weights
0:0.12
1:0.03
2:0.11
3:0.03
4:0.06
5:0.09
6:0.06
7:0.07
8:0.04
9:0.07
10:0.18
11:0.09
Negative Logits
��
-1.26
natureconservancy
-1.24
ghai
-1.21
angular
-1.20
assic
-1.15
Orbit
-1.12
Milky
-1.11
ocene
-1.10
breeze
-1.06
tidal
-1.06
POSITIVE LOGITS
unsuccessfully
1.35
unsub
1.25
repeatedly
1.19
angrily
1.15
Examples
1.15
citing
1.13
amnesty
1.13
falsely
1.12
apologize
1.10
claim
1.07
Activations Density 0.084%