INDEX
Explanations
phrases indicating health-related issues and concerns
New Auto-Interp
Negative Logits
INET
-0.16
indow
-0.16
fo
-0.15
astos
-0.15
ردÙĩ
-0.15
PLUS
-0.15
Plus
-0.15
anela
-0.15
BorderStyle
-0.15
ometr
-0.14
POSITIVE LOGITS
reasons
0.22
åİŁåĽł
0.22
partly
0.19
reason
0.17
because
0.17
Reasons
0.17
partially
0.16
ิà¹Ĥ
0.16
ÙĮ
0.15
822
0.15
Activations Density 0.161%