INDEX
Explanations
references to conspiracy theories and government covert operations
New Auto-Interp
Negative Logits
ofire
-0.14
ains
-0.14
elry
-0.14
erupt
-0.14
Inspection
-0.13
ifest
-0.13
ga
-0.13
Inspect
-0.12
Sherlock
-0.12
ÃŃd
-0.12
POSITIVE LOGITS
secret
0.41
Secret
0.33
-secret
0.31
SECRET
0.30
secrets
0.30
classified
0.29
secret
0.29
Secret
0.29
ç§ĺ
0.29
secretly
0.28
Activations Density 0.051%