INDEX
Explanations
variations of the word "pr" in different contexts
New Auto-Interp
Negative Logits
portions
-0.15
i
-0.15
Bri
-0.14
rej
-0.14
erville
-0.14
zon
-0.14
unding
-0.14
ans
-0.14
urb
-0.14
oct
-0.14
POSITIVE LOGITS
vi
0.22
vo
0.20
ilik
0.19
ви
0.18
va
0.18
avo
0.17
vim
0.17
vu
0.17
itom
0.17
ve
0.16
Activations Density 0.002%