INDEX
Explanations
phrases related to surveillance and potentially violent incidents
mentions of eavesdropping and fatal incidents
New Auto-Interp
Negative Logits
tsky
-0.85
naires
-0.77
lli
-0.75
anan
-0.70
sights
-0.70
icals
-0.64
Gram
-0.64
yon
-0.63
ogether
-0.63
hift
-0.62
POSITIVE LOGITS
cens
0.80
psc
0.76
odied
0.75
enegger
0.75
disarm
0.73
izophren
0.73
ufact
0.71
disband
0.70
indebted
0.68
prof
0.68
Activations Density 0.018%