INDEX
Explanations
pronoun followed by verb or possessive
New Auto-Interp
Negative Logits
as
0.72
on
0.63
م
0.62
is
0.59
um
0.59
्यून
0.56
végétaux
0.55
X
0.55
aml
0.54
Estamos
0.54
POSITIVE LOGITS
他的
0.72
himself
0.71
0.69
statesmen
0.62
صلى
0.61
his
0.60
Majesty
0.59
mming
0.58
0.58
получил
0.56
Activations Density 0.280%