INDEX
Explanations
demonstrative pronouns indicating specific objects or concepts
New Auto-Interp
Negative Logits
воÑĢ
-0.17
aji
-0.17
onym
-0.16
phong
-0.16
ophon
-0.15
пой
-0.15
elda
-0.14
endor
-0.14
adoo
-0.14
odiac
-0.14
POSITIVE LOGITS
regard
0.40
regards
0.37
instance
0.31
case
0.29
respect
0.28
respects
0.28
vein
0.25
context
0.25
respect
0.24
instance
0.23
Activations Density 0.067%