INDEX
Explanations
the presence of dietary and health-related terminology
New Auto-Interp
Negative Logits
uno
-0.16
rever
-0.16
pto
-0.14
[url
-0.14
Dustin
-0.14
Bark
-0.14
BACKGROUND
-0.14
زب
-0.14
reckon
-0.14
адÑĥ
-0.13
POSITIVE LOGITS
Ðħ
0.15
ONTAL
0.15
âĹĦ
0.14
yor
0.14
moth
0.14
OTOS
0.14
ohn
0.14
otos
0.14
âĦĸ
0.14
âĻ¡
0.14
Activations Density 0.020%