INDEX
Explanations
word sequences that indicate future actions or events
New Auto-Interp
Negative Logits
lees
-0.86
xual
-0.74
lee
-0.71
Franch
-0.70
then
-0.70
theless
-0.69
olor
-0.67
ords
-0.67
still
-0.66
tics
-0.66
POSITIVE LOGITS
generation
1.21
installment
1.10
step
1.03
iteration
0.99
phase
0.98
logical
0.98
batch
0.95
millenn
0.93
decade
0.92
chapter
0.89
Activations Density 0.048%