INDEX
Explanations
instances of the word "On" used to indicate transitions or themes in the text
New Auto-Interp
Negative Logits
/by
-0.19
/from
-0.18
imli
-0.17
ãĤĩ
-0.15
duct
-0.15
олÑĮно
-0.15
clusions
-0.14
memberof
-0.14
alties
-0.14
cul
-0.14
POSITIVE LOGITS
ward
0.30
balance
0.28
closer
0.26
average
0.25
paper
0.24
es
0.24
reflection
0.24
top
0.22
thing
0.21
rare
0.21
Activations Density 0.072%