INDEX
Explanations
references to research findings and expert opinions related to health and social issues
New Auto-Interp
Negative Logits
pid
-0.15
rung
-0.14
subs
-0.14
fet
-0.14
bao
-0.14
yt
-0.14
PID
-0.13
ASSERT
-0.13
iro
-0.13
ighted
-0.13
POSITIVE LOGITS
Compat
0.18
uner
0.17
ythe
0.16
NES
0.15
éĻ
0.14
Bulk
0.14
ivalence
0.14
kas
0.14
ikers
0.14
ÃĽ
0.14
Activations Density 0.100%