INDEX
Explanations
references to animals in a contextual or metaphorical way
New Auto-Interp
Negative Logits
лÑİÑĩаеÑĤÑģÑı
-0.16
was
-0.15
arlo
-0.15
íķĺëĭ¤
-0.14
должен
-0.14
sta
-0.14
Spear
-0.14
uhe
-0.14
ph
-0.14
asta
-0.14
POSITIVE LOGITS
são
0.36
estão
0.32
podem
0.31
foram
0.28
sao
0.28
tin
0.26
têm
0.25
serão
0.24
were
0.23
are
0.22
Activations Density 0.008%