INDEX
Explanations
pronouns followed by verbs or prepositions
New Auto-Interp
Negative Logits
classic
0.77
Classic
0.68
dangerous
0.62
দেখাতে
0.61
класси
0.61
जाएं
0.61
cout
0.59
ুগ
0.59
Shadow
0.58
often
0.58
POSITIVE LOGITS
마음
0.78
अंजीर
0.75
A
0.74
रूम
0.74
Estim
0.74
Das
0.73
We
0.72
ۍ
0.71
Lime
0.70
рэгістрацыі
0.69
Activations Density 0.126%