INDEX
Explanations
phrases indicating conditions and effects related to health and well-being
on, for, or to a subject
New Auto-Interp
Negative Logits
Personensuche
-0.74
ViewInit
-0.71
CURIAM
-0.67
CommonModule
-0.66
pleaſure
-0.66
الحره
-0.65
FetchType
-0.65
matchCondition
-0.63
beſ
-0.62
ſtill
-0.62
POSITIVE LOGITS
对
0.47
这对
0.46
health
0.45
health
0.43
detrimental
0.42
impact
0.40
對
0.39
harming
0.38
Impact
0.38
对
0.37
Activations Density 0.209%