INDEX
Explanations
phrases related to national security or public safety
references to national security and public safety concerns
New Auto-Interp
Negative Logits
Ly
-0.73
Lot
-0.72
Mask
-0.70
ANN
-0.67
igs
-0.65
confess
-0.64
Zen
-0.62
married
-0.62
verse
-0.62
Canaver
-0.61
POSITIVE LOGITS
wellbeing
0.96
deterrence
0.87
jriwal
0.87
crises
0.86
interests
0.85
preservation
0.83
objectives
0.82
liberties
0.81
endanger
0.80
rity
0.80
Activations Density 0.224%