INDEX
Explanations
transitions or markers indicating sections or segments in a text
New Auto-Interp
Negative Logits
ſtate
-0.97
myſelf
-0.94
ſche
-0.92
ſelf
-0.91
pleaſure
-0.90
ſever
-0.89
purpoſe
-0.88
ſeveral
-0.87
ftate
-0.87
uſ
-0.85
POSITIVE LOGITS
Finally
0.55
Lastly
0.53
↵↵
0.50
↵
0.50
Finally
0.48
Alongside
0.45
Aside
0.44
Despite
0.42
Lastly
0.42
наконец
0.40
Activations Density 0.084%