INDEX
Explanations
sections that are formatted as citations with a colon
colons that introduce lists or details
New Auto-Interp
Negative Logits
ecided
-0.80
spont
-0.74
obbies
-0.73
pherd
-0.70
schild
-0.69
undai
-0.69
avorite
-0.68
eryl
-0.67
milo
-0.67
outl
-0.66
POSITIVE LOGITS
][
0.90
leg
0.78
Latest
0.76
memory
0.74
Explicit
0.74
lement
0.70
Provided
0.70
::::::::
0.69
Retro
0.68
Comic
0.67
Activations Density 0.033%