INDEX
Explanations
the abbreviation "Dr" and related terms indicating a doctor or medical context
New Auto-Interp
Negative Logits
ende
-0.17
791
-0.17
itta
-0.17
oya
-0.15
raf
-0.15
ahun
-0.15
cerr
-0.15
iqu
-0.15
oub
-0.15
itor
-0.14
POSITIVE LOGITS
ift
0.29
inks
0.28
inking
0.27
illing
0.26
ives
0.26
ifting
0.25
agnet
0.25
unk
0.25
astic
0.24
ugs
0.22
Activations Density 0.023%