INDEX
Explanations
instances of the pronoun "we" in various contexts
New Auto-Interp
Negative Logits
rze
-0.16
gni
-0.15
ieri
-0.15
omo
-0.15
-valu
-0.15
arshal
-0.15
rosso
-0.14
lector
-0.14
mosaic
-0.14
rippling
-0.14
POSITIVE LOGITS
acer
0.14
666
0.14
Ø·ÙĨ
0.14
316
0.14
456
0.14
jev
0.13
hek
0.13
irting
0.13
ãĤ¡
0.13
èĹ
0.13
Activations Density 0.055%