INDEX
Explanations
prepositions followed by determiners
New Auto-Interp
Negative Logits
3
0.35
DO
0.34
सहित
0.32
מח
0.31
이며
0.31
indicative
0.30
galore
0.30
V
0.30
_
0.30
9
0.30
POSITIVE LOGITS
tohoto
0.45
dieser
0.41
этот
0.41
this
0.40
О
0.40
một
0.40
Amerikaanse
0.39
этой
0.39
هذا
0.38
cette
0.38
Activations Density 2.872%