INDEX
Explanations
listing items or alternatives
New Auto-Interp
Negative Logits
0.60
that
0.44
I
0.42
toate
0.42
which
0.38
that
0.38
hali
0.38
sesi
0.37
ר
0.36
่
0.36
POSITIVE LOGITS
其他
0.43
tecnología
0.39
ఇతర
0.38
τότε
0.37
vět
0.37
citep
0.36
políticas
0.36
<unused2026>
0.36
чёр
0.36
política
0.35
Activations Density 0.714%