INDEX
Explanations
instances of the word "reveal" and its variations
New Auto-Interp
Negative Logits
ello
-0.16
elier
-0.16
estre
-0.15
olan
-0.15
à¯įà®
-0.15
Unidos
-0.15
itten
-0.14
-Ñħ
-0.14
igu
-0.14
SSIP
-0.14
POSITIVE LOGITS
secrets
0.27
details
0.23
ry
0.21
Secrets
0.20
why
0.20
hidden
0.19
how
0.18
iance
0.18
tid
0.18
ments
0.17
Activations Density 0.031%