INDEX
Explanations
words related to information security, political intrigue, and international relations
expressions of clarity and importance regarding social or political issues
New Auto-Interp
Negative Logits
respectively
-0.55
ãĤ´ãĥ³
-0.52
FAULT
-0.51
ielding
-0.50
bia
-0.50
iga
-0.49
interstitial
-0.48
("-0.48
Reilly
-0.48
inducing
-0.47
POSITIVE LOGITS
..."
1.16
[/
1.09
?'
1.03
)",
1.02
%"
1.02
â̦"
1.01
)"
0.98
,''
0.96
[/
0.93
),"
0.93
Activations Density 1.525%