INDEX
Explanations
claims or statements about health and wellness myths
New Auto-Interp
Negative Logits
lop
-0.15
eve
-0.14
project
-0.14
orc
-0.14
Fucking
-0.14
aug
-0.14
à¸ĺ
-0.14
OrFail
-0.14
Engel
-0.14
obce
-0.13
POSITIVE LOGITS
Sesso
0.17
Myth
0.16
Skip
0.15
science
0.15
çķª
0.14
OffsetTable
0.14
osis
0.14
scientific
0.14
anes
0.14
Ferd
0.14
Activations Density 0.081%