INDEX
Explanations
possessive forms and phrases indicating ownership or experiences
New Auto-Interp
Negative Logits
NESS
-0.07
Ñĩеловек
-0.07
itself
-0.07
enÄĽ
-0.07
елов
-0.07
agrid
-0.07
ulumi
-0.07
sám
-0.07
InstanceOf
-0.07
ivial
-0.06
POSITIVE LOGITS
minds
0.09
themselves
0.09
lives
0.08
efforts
0.08
hearts
0.07
ongyang
0.07
rights
0.07
choices
0.07
favorite
0.07
ability
0.07
Activations Density 0.022%