INDEX
Explanations
possessive pronouns and own
New Auto-Interp
Negative Logits
relacionamento
0.43
ability
0.42
日益
0.39
отношению
0.39
看似
0.38
mighty
0.37
talento
0.36
allem
0.36
alph
0.35
aniem
0.35
POSITIVE LOGITS
own
0.80
Own
0.77
Own
0.75
own
0.73
propia
0.69
OWN
0.68
propio
0.66
propres
0.65
eigene
0.63
propias
0.63
Activations Density 0.007%