INDEX
Explanations
statements related to human rights violations and their consequences
New Auto-Interp
Head Attr Weights
0:0.07
1:0.04
2:0.07
3:0.06
4:0.20
5:0.07
6:0.05
7:0.21
8:0.04
9:0.04
10:0.04
11:0.06
Negative Logits
orno
-2.40
enthus
-2.38
ezvous
-2.33
nery
-2.28
ウス
-2.28
him
-2.27
Born
-2.24
hangs
-2.23
His
-2.23
achine
-2.20
POSITIVE LOGITS
Its
3.29
its
3.11
Its
2.96
cited
2.73
its
2.50
Recommend
2.49
emphasis
2.45
Footnote
2.44
itself
2.41
spokesperson
2.35
Activations Density 0.348%