INDEX
Explanations
instances of the word "but" in various forms and contexts
New Auto-Interp
Negative Logits
бо
-0.16
ziel
-0.16
olly
-0.16
therefore
-0.15
duct
-0.15
§
-0.15
alike
-0.15
uppe
-0.15
beth
-0.15
ops
-0.15
POSITIVE LOGITS
chers
0.26
ters
0.25
term
0.25
lers
0.25
åĩ¡
0.23
tpl
0.22
rint
0.22
ler
0.22
ts
0.22
ressing
0.22
Activations Density 0.049%