INDEX
Explanations
introduces definitions or explanations
New Auto-Interp
Negative Logits
۰
0.64
egip
0.52
’
0.52
camas
0.50
mịn
0.48
itabbam
0.48
頂いた
0.48
счастли
0.47
temporadas
0.46
이었
0.46
POSITIVE LOGITS
ية
0.60
是
0.59
0.55
ير
0.54
is
0.47
was
0.46
是我们
0.45
kamen
0.45
beelding
0.44
는
0.44
Activations Density 1.376%