INDEX
Explanations
phrases related to secrecy or hiding
New Auto-Interp
Negative Logits
ctive
-0.71
oday
-0.71
baugh
-0.71
reciation
-0.70
nesota
-0.69
rompt
-0.69
ucc
-0.69
aldi
-0.69
annis
-0.69
ragon
-0.67
POSITIVE LOGITS
secrets
0.98
confidential
0.90
hidden
0.89
Secrets
0.88
ariat
0.87
arial
0.87
cloaked
0.85
informant
0.85
secret
0.84
stash
0.81
Activations Density 2.543%