INDEX
Explanations
starts of new sections or paragraphs in a text
instances of the word "Next" indicating a sequence or continuation in a text
New Auto-Interp
Negative Logits
lees
-0.66
kay
-0.64
zinski
-0.62
ocker
-0.61
ans
-0.61
arbon
-0.61
acons
-0.59
ondon
-0.58
ogether
-0.57
ITH
-0.57
POSITIVE LOGITS
Steps
1.10
door
1.04
week
0.98
steps
0.96
month
0.94
Generation
0.92
generation
0.87
Month
0.86
door
0.85
year
0.85
Activations Density 0.028%