INDEX
Explanations
phrases that indicate preparation or setup
phrases related to preparation or positioning for future events
New Auto-Interp
Negative Logits
sake
-0.61
Pastebin
-0.60
simplicity
-0.59
abstraction
-0.58
loss
-0.57
cultured
-0.56
tein
-0.55
denomin
-0.55
deduct
-0.52
mentioning
-0.52
POSITIVE LOGITS
for
0.84
for
0.84
entin
0.77
Ready
0.75
toward
0.75
ombo
0.75
eele
0.74
Against
0.73
GGGGGGGG
0.73
For
0.73
Activations Density 0.284%