INDEX
Explanations
instances of violence and physical assault
New Auto-Interp
Head Attr Weights
0:0.14
1:0.01
2:0.13
3:0.06
4:0.08
5:0.05
6:0.04
7:0.04
8:0.18
9:0.07
10:0.09
11:0.07
Negative Logits
uyomi
-2.00
guiActiveUn
-1.76
Bulletin
-1.62
ageing
-1.59
relegation
-1.57
pione
-1.56
Vanguard
-1.55
odcast
-1.55
Dhabi
-1.54
Pulitzer
-1.53
POSITIVE LOGITS
SPONSORED
1.75
nce
1.70
hered
1.61
them
1.61
Bulgar
1.60
isters
1.60
eers
1.56
undy
1.55
seek
1.54
lifting
1.52
Activations Density 0.002%