INDEX
Explanations
a father, a doctor, a project
New Auto-Interp
Negative Logits
pouquinho
0.30
льнявыя
0.29
diamètre
0.28
filmpje
0.27
whiche
0.27
wodurch
0.27
яе
0.26
carénés
0.26
interessant
0.26
মিজান
0.26
POSITIVE LOGITS
0.45
:
0.39
,
0.30
and
0.26
the
0.25
↵↵
0.24
Chicago
0.24
$
0.24
political
0.24
$
0.24
Activations Density 0.706%