INDEX
Explanations
conjunctions connecting phrases or clauses
the presence of document delimiters or markers indicating the start and end of content
New Auto-Interp
Negative Logits
bub
-0.53
ÂŃ
-0.52
—-
-0.51
.*
-0.50
hard
-0.49
uncont
-0.48
hub
-0.48
legally
-0.48
.#
-0.47
campaign
-0.47
POSITIVE LOGITS
romeda
1.05
rew
1.01
rogens
0.97
ERSON
0.94
rogen
0.87
rost
0.72
rea
0.71
secondly
0.70
alus
0.70
then
0.69
Activations Density 0.065%