INDEX
Explanations
instances of civil liability and attempts to distract from serious allegations
New Auto-Interp
Head Attr Weights
0:0.07
1:0.06
2:0.01
3:0.23
4:0.05
5:0.16
6:0.06
7:0.02
8:0.08
9:0.20
10:0.01
11:0.01
Negative Logits
sth
-1.66
vision
-1.66
cradle
-1.60
Compass
-1.59
Certified
-1.56
single
-1.51
varies
-1.50
Avg
-1.49
Mé
-1.49
canopy
-1.48
POSITIVE LOGITS
unrelated
2.04
ulz
2.00
embarrassment
1.95
inflammatory
1.93
blackmail
1.91
embarrassing
1.87
discredit
1.87
meaningless
1.84
unsuspecting
1.78
unpopular
1.78
Activations Density 0.823%