INDEX
Explanations
conjunctions and phrases indicating continuity or addition
New Auto-Interp
Negative Logits
atte
-0.17
aises
-0.16
Beled
-0.15
gart
-0.15
ãĥ¼ãĥł
-0.15
gne
-0.14
inded
-0.14
erate
-0.14
oplevel
-0.14
yn
-0.14
POSITIVE LOGITS
vi
0.17
viol
0.16
bane
0.16
untu
0.15
ills
0.15
-selector
0.15
ibri
0.15
onto
0.14
asta
0.14
609
0.14
Activations Density 0.188%