INDEX
Explanations
the word "legs" and words that can be associated with body parts
New Auto-Interp
Negative Logits
-1.55
the
-1.40
in
-1.34
a
-1.20
,
-1.14
↵↵
-1.09
as
-1.08
and
-1.05
of
-1.04
an
-1.03
POSITIVE LOGITS
متعلقه
1.87
Theſe
1.77
myſelf
1.74
Monfieur
1.69
auffi
1.68
pleaſure
1.66
ſche
1.65
Reſ
1.65
iſt
1.63
itſelf
1.63
Activations Density 1.301%