INDEX
Explanations
words indicating transitions and significance in narratives or arguments
New Auto-Interp
Negative Logits
meanwhile
-0.17
allerdings
-0.16
sice
-0.15
however
-0.15
amen
-0.14
çĦ¶èĢĮ
-0.14
nt
-0.14
однако
-0.14
either
-0.13
ainsi
-0.13
POSITIVE LOGITS
forth
0.20
importantly
0.20
vice
0.18
vice
0.16
ebek
0.16
/OR
0.15
(?)
0.15
(!
0.15
some
0.15
even
0.15
Activations Density 0.236%