INDEX
Explanations
phrases indicating the effects of various issues or phenomena
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.15
3:0.04
4:0.28
5:0.07
6:0.02
7:0.02
8:0.12
9:0.14
10:0.04
11:0.01
Negative Logits
WithNo
-1.41
�
-1.35
willing
-1.30
NAS
-1.26
Jess
-1.26
č
-1.25
enture
-1.24
Cour
-1.23
��
-1.21
ISO
-1.21
POSITIVE LOGITS
ratom
1.61
hops
1.47
effects
1.43
exacerbate
1.42
impacts
1.40
worsen
1.33
pollutants
1.31
harm
1.30
adversely
1.27
destabil
1.26
Activations Density 0.007%