INDEX
Explanations
those followed by who/that/in
New Auto-Interp
Negative Logits
er
0.85
ות
0.75
es
0.74
ad
0.71
ার
0.67
다
0.66
hypoglycemia
0.64
(
0.63
to
0.62
نے
0.60
POSITIVE LOGITS
ه
0.84
ని
0.69
ρες
0.64
a
0.64
pesky
0.63
것
0.63
ส์
0.61
פר
0.60
ರು
0.58
lene
0.57
Activations Density 0.030%