INDEX
Explanations
terminology related to law enforcement and security
phrases related to security and risk assessment
New Auto-Interp
Negative Logits
princ
-0.73
Everywhere
-0.71
thood
-0.64
coffin
-0.64
ruary
-0.64
everywhere
-0.64
unanimous
-0.64
congratulated
-0.63
Bliss
-0.62
ecstatic
-0.60
POSITIVE LOGITS
pose
0.97
endanger
0.96
inappropriately
0.92
adversely
0.92
reasonably
0.89
otherwise
0.88
harm
0.86
posed
0.86
jeopard
0.85
potentially
0.84
Activations Density 0.361%