INDEX
Explanations
the word "but" and similar conjunctions indicating contrast or opposition
New Auto-Interp
Negative Logits
uppe
-0.17
бо
-0.16
§
-0.16
olly
-0.16
ops
-0.16
therefore
-0.16
alike
-0.15
xies
-0.15
ziel
-0.15
beth
-0.15
POSITIVE LOGITS
term
0.24
lers
0.24
chers
0.24
ressing
0.24
ler
0.23
ters
0.22
ressed
0.22
åĩ¡
0.21
rint
0.21
tpl
0.21
Activations Density 0.048%