INDEX
Explanations
words related to medical or scientific terminology
New Auto-Interp
Negative Logits
al
-0.26
h
-0.22
t
-0.21
on
-0.20
e
-0.19
es
-0.18
alach
-0.18
ein
-0.17
i
-0.17
tle
-0.17
POSITIVE LOGITS
heck
0.30
chio
0.27
ourt
0.26
̧
0.25
raft
0.25
illin
0.24
loud
0.24
hest
0.24
idal
0.22
ke
0.22
Activations Density 0.073%