INDEX
Explanations
phrases related to discussions of medical research and effectiveness
New Auto-Interp
Negative Logits
tre
-0.15
whatever
-0.14
ANNEL
-0.13
θε
-0.13
enci
-0.13
ront
-0.13
èĸ
-0.13
tul
-0.13
ependency
-0.12
_trace
-0.12
POSITIVE LOGITS
how
0.38
how
0.28
why
0.26
å¦Ĥä½ķ
0.24
cómo
0.23
what
0.20
exactly
0.19
hoe
0.16
-how
0.16
为ä»Ģä¹Ī
0.16
Activations Density 0.197%