INDEX
Explanations
secrets and hidden information
New Auto-Interp
Negative Logits
<0x80>
0.63
0
0.61
Sper
0.60
ти
0.56
Seda
0.56
Gew
0.55
Zur
0.55
Telefon
0.55
Bernd
0.55
<0xAB>
0.54
POSITIVE LOGITS
Secrets
1.27
secrets
1.24
secrets
1.18
secret
1.06
secret
1.05
secretos
1.01
SECRET
0.96
비밀
0.96
Secrets
0.95
secrete
0.95
Activations Density 0.068%