INDEX
Explanations
phrases indicating significant revelations or identity changes in a narrative context
New Auto-Interp
Negative Logits
lect
-0.15
Trigger
-0.15
£i
-0.15
ãģ§ãģĹãĤĩãģĨ
-0.14
regs
-0.14
ect
-0.13
#
-0.13
etooth
-0.13
Alone
-0.13
alone
-0.13
POSITIVE LOGITS
unb
0.19
secret
0.18
quires
0.17
secretly
0.17
Secret
0.16
secret
0.15
uther
0.15
ecret
0.15
footer
0.15
-secret
0.14
Activations Density 0.181%