INDEX
Explanations
phrases that indicate comparisons or similarities
New Auto-Interp
Negative Logits
acco
-0.18
them
-0.17
Them
-0.15
IGHL
-0.15
eux
-0.14
orsi
-0.14
nä
-0.14
ãģĵãģ¨ãģ«
-0.14
rou
-0.14
обÑĢаж
-0.13
POSITIVE LOGITS
they
0.28
there
0.26
it
0.25
something
0.22
someone
0.22
able
0.21
nothing
0.21
part
0.20
we
0.19
somebody
0.19
Activations Density 0.051%