INDEX
Explanations
conjunctions and linking words in the text
New Auto-Interp
Negative Logits
ness
-0.17
elage
-0.15
utherford
-0.14
'er
-0.14
bef
-0.14
Shut
-0.14
ê²
-0.13
arkers
-0.13
arty
-0.13
arness
-0.13
POSITIVE LOGITS
uf
0.17
rok
0.16
vant
0.16
Stevenson
0.15
нии
0.15
arning
0.15
especially
0.14
atha
0.14
дÑı
0.14
Pist
0.14
Activations Density 0.270%