INDEX
Explanations
colons followed by statements or quotes
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.17
3:0.06
4:0.13
5:0.02
6:0.04
7:0.16
8:0.05
9:0.04
10:0.08
11:0.15
Negative Logits
Surveillance
-1.44
Rider
-1.44
crime
-1.37
licensed
-1.36
steroids
-1.29
sweats
-1.29
taxis
-1.29
ordinances
-1.27
surveillance
-1.27
killer
-1.26
POSITIVE LOGITS
rief
1.72
tions
1.69
hesis
1.67
etus
1.64
enario
1.50
enture
1.48
llo
1.48
objection
1.47
congress
1.47
podium
1.46
Activations Density 0.001%