INDEX
Explanations
phrases related to personal opinions and reflections on experiences
New Auto-Interp
Negative Logits
彼の
-0.89
him
-0.76
彼女の
-0.75
dessen
-0.69
them
-0.67
Его
-0.67
彼が
-0.67
把他
-0.65
ihn
-0.65
将他
-0.65
POSITIVE LOGITS
he
0.96
she
0.75
they
0.63
we
0.56
actéristi
0.52
bildēt
0.52
you
0.51
mình
0.50
를
0.46
tantôt
0.45
Activations Density 2.044%