INDEX
Explanations
mentions of specific organizations being considered as potential targets for particular actions
New Auto-Interp
Negative Logits
daq
-0.74
stan
-0.72
boy
-0.70
loo
-0.70
nick
-0.69
CV
-0.68
beat
-0.68
edin
-0.67
book
-0.66
wa
-0.66
POSITIVE LOGITS
geries
1.02
beginners
0.98
bidden
0.97
determining
0.96
locating
0.95
gery
0.93
constructing
0.93
navigating
0.92
sorts
0.90
selecting
0.89
Activations Density 0.135%