INDEX
Explanations
the word "Be" in various contexts
New Auto-Interp
Negative Logits
recht
-0.15
rik
-0.15
ar
-0.15
so
-0.15
.sg
-0.15
rib
-0.15
ri
-0.14
.chapter
-0.14
bidden
-0.14
ww
-0.14
POSITIVE LOGITS
atrix
0.28
auf
0.22
arded
0.21
ardless
0.21
heading
0.20
sure
0.20
autiful
0.19
ause
0.19
(Be
0.18
aucoup
0.18
Activations Density 0.031%