INDEX
Explanations
mentions of health-related topics and locations associated with medical discussions
New Auto-Interp
Negative Logits
cot
-0.16
czy
-0.15
heimer
-0.15
rens
-0.15
illo
-0.14
quine
-0.14
erras
-0.14
seeded
-0.14
lesi
-0.14
zell
-0.14
POSITIVE LOGITS
iri
0.14
loub
0.14
ê´ij
0.14
ç´
0.14
_PULL
0.13
GIT
0.13
Sullivan
0.13
GetObject
0.13
elli
0.13
GY
0.13
Activations Density 0.009%