INDEX
Explanations
references to secretive or confidential information
occurrences of the term "secret."
New Auto-Interp
Negative Logits
avers
-0.77
older
-0.77
annis
-0.74
©¶æ
-0.68
brim
-0.68
lihood
-0.68
gaard
-0.67
ammers
-0.67
ð
-0.67
È
-0.67
POSITIVE LOGITS
secret
1.18
secret
1.10
Secret
1.03
rets
1.01
secrets
0.99
arial
0.92
ariat
0.88
informant
0.83
hidden
0.80
ulously
0.79
Activations Density 0.009%