INDEX
Explanations
numeric values related to sentences or passages in a legal or news context
phrases related to legal actions and consequences
New Auto-Interp
Head Attr Weights
0:0.05
1:0.01
2:0.16
3:0.04
4:0.25
5:0.04
6:0.02
7:0.01
8:0.23
9:0.07
10:0.04
11:0.01
Negative Logits
rouse
-1.31
ignant
-1.30
pir
-1.29
enegger
-1.27
PRESS
-1.22
ergy
-1.16
�
-1.16
QL
-1.15
eff
-1.15
Squirrel
-1.14
POSITIVE LOGITS
Pastebin
1.32
bin
1.28
umbn
1.27
iths
1.24
washing
1.23
lain
1.21
mats
1.19
ppe
1.16
Bless
1.15
geon
1.14
Activations Density 0.006%