INDEX
Explanations
dietary restrictions and health
New Auto-Interp
Negative Logits
ש
1.36
ท
1.19
ล
1.17
м
1.14
ب
1.10
나
1.07
ר
1.05
ために
1.04
บ
1.04
υ
1.02
POSITIVE LOGITS
on
1.70
a
1.55
ir
1.53
k
1.20
al
1.15
ten
1.13
ty
1.09
and
1.08
ing
1.06
to
1.05
Activations Density 0.001%