INDEX
Explanations
prepositions followed by variables or nouns
New Auto-Interp
Negative Logits
البحث
0.61
القيام
0.56
indest
0.50
Между
0.50
ውሃ
0.50
ලබා
0.49
ATUM
0.48
канторы
0.48
Klicken
0.47
حيات
0.47
POSITIVE LOGITS
and
0.78
as
0.60
the
0.54
1
0.52
\
0.52
và
0.51
\
0.50
0.49
↵
0.49
multiple
0.49
Activations Density 0.168%