INDEX
Explanations
pronouns followed by past actions/states
New Auto-Interp
Negative Logits
использу
0.50
দেশকে
0.49
kullanarak
0.47
Wants
0.44
Allows
0.44
possa
0.43
genutzt
0.43
Want
0.40
Использу
0.40
Recently
0.40
POSITIVE LOGITS
was
0.56
became
0.53
officially
0.53
began
0.50
remained
0.50
received
0.47
entered
0.45
is
0.44
formally
0.44
comprehensively
0.41
Activations Density 0.008%