INDEX
Explanations
titles or instructions starting with "How"
phrases beginning with "How" that imply instructions or guidance
New Auto-Interp
Negative Logits
usher
-0.70
Tanz
-0.68
uthor
-0.66
ivity
-0.65
IFIED
-0.65
elimination
-0.63
hereafter
-0.61
peak
-0.59
mor
-0.57
pont
-0.56
POSITIVE LOGITS
soever
0.95
lers
0.86
itzer
0.82
ling
0.77
ells
0.74
ever
0.73
links
0.72
umbai
0.70
abouts
0.70
Steps
0.69
Activations Density 0.055%