INDEX
Explanations
words related to art and its history
New Auto-Interp
Negative Logits
A
-0.27
u
-0.26
,
-0.26
w
-0.25
-
-0.25
and
-0.25
l
-0.24
f
-0.24
-0.24
the
-0.24
POSITIVE LOGITS
ож
0.31
ожд
0.25
еж
0.25
еÑī
0.24
ÑĢаÑī
0.24
еÑĩ
0.24
Ñij
0.23
нож
0.22
оÑī
0.22
ÑĥÑī
0.22
Activations Density 0.025%