INDEX
Explanations
terms related to health, specifically those concerning medical conditions or terms signifying caution and assessment of health risks
New Auto-Interp
Negative Logits
же
-0.20
oi
-0.19
ове
-0.18
uv
-0.17
iw
-0.17
uw
-0.17
евиÑĩ
-0.16
ил
-0.16
479
-0.16
u
-0.16
POSITIVE LOGITS
ÑĶ
0.38
ÑİÑĤÑĮ
0.37
ÑĶÑĤÑĮÑģÑı
0.33
ÑĶÑĤе
0.31
ÑİÑĩи
0.28
ÑİÑĤÑĮÑģÑı
0.28
ÑĶÑĪ
0.25
ÑĶмо
0.25
Ñİ
0.21
ÐĦ
0.21
Activations Density 0.011%