INDEX
Explanations
phrases that express concern or refer to risks and uncertainties
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.07
3:0.06
4:0.33
5:0.03
6:0.03
7:0.17
8:0.03
9:0.04
10:0.07
11:0.09
Negative Logits
innocence
-1.64
iphany
-1.59
disconnected
-1.47
eness
-1.45
metaphor
-1.42
feeling
-1.40
silence
-1.40
witness
-1.40
bitterness
-1.38
represented
-1.38
POSITIVE LOGITS
lees
1.68
ossal
1.58
airs
1.45
Hels
1.42
":{"1.41
Locations
1.40
Garg
1.39
��
1.38
Siem
1.36
jas
1.36
Activations Density 0.001%