INDEX
Explanations
names of government positions followed by the word "secret"
terms related to confidentiality and secrecy
New Auto-Interp
Negative Logits
Springer
-0.73
brim
-0.70
à
-0.66
odcast
-0.63
alone
-0.63
asts
-0.61
gaard
-0.60
avers
-0.60
Surv
-0.59
Tornado
-0.58
POSITIVE LOGITS
secret
1.35
Secret
1.14
rets
0.92
secret
0.90
ategic
0.88
terday
0.84
arial
0.84
ulously
0.83
uously
0.78
ories
0.77
Activations Density 0.007%