INDEX
Explanations
sentences that prompt the reader to take action or engage with content
phrases that indicate something was missed or not seen
New Auto-Interp
Negative Logits
eteenth
-0.71
Officers
-0.65
mination
-0.62
ettel
-0.62
kas
-0.62
Aim
-0.61
ospel
-0.60
kiss
-0.59
reth
-0.55
dq
-0.55
POSITIVE LOGITS
anything
0.94
any
0.77
me
0.76
spoilers
0.71
PW
0.69
anything
0.69
Pastebin
0.69
yourselves
0.67
ANY
0.65
yourself
0.65
Activations Density 0.158%