INDEX
Explanations
often treated, within contexts
New Auto-Interp
Negative Logits
iaire
0.46
پردا
0.45
ille
0.45
دارند
0.45
ea
0.44
exhibited
0.44
تب
0.44
ués
0.43
ብዙውን
0.43
èves
0.43
POSITIVE LOGITS
髗
0.48
pressing
0.46
기본적인
0.41
garantia
0.41
친구
0.40
compromising
0.40
mannschaft
0.39
Para
0.39
football
0.39
人を
0.38
Activations Density 0.002%