INDEX
Explanations
pronouns and character names
New Auto-Interp
Negative Logits
WITH
0.52
ihrer
0.51
THEIR
0.48
their
0.47
他们的
0.47
themselves
0.46
jų
0.46
他們
0.45
Их
0.45
त्यांच्या
0.44
POSITIVE LOGITS
fifty
0.47
akke
0.45
няколко
0.42
fifty
0.40
midt
0.40
zelf
0.38
два
0.38
fy
0.37
ና
0.37
რამდენ
0.37
Activations Density 0.012%