INDEX
Explanations
acronyms and abbreviations related to human rights and organizations
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.06
3:0.05
4:0.04
5:0.05
6:0.42
7:0.04
8:0.05
9:0.06
10:0.07
11:0.04
Negative Logits
uania
-1.58
verages
-1.41
qualify
-1.32
inav
-1.30
gpu
-1.29
minecraft
-1.27
yden
-1.22
nuance
-1.21
boycott
-1.17
predict
-1.15
POSITIVE LOGITS
Bib
1.50
Û
1.50
Railway
1.41
nown
1.38
aroo
1.31
apple
1.30
aneers
1.29
hens
1.27
models
1.24
arie
1.24
Activations Density 0.001%