INDEX
Explanations
references to medical professionals, specifically doctors
New Auto-Interp
Negative Logits
ened
-0.17
gio
-0.16
ennon
-0.15
riters
-0.15
ens
-0.15
raf
-0.15
elite
-0.14
ed
-0.14
ende
-0.14
olic
-0.14
POSITIVE LOGITS
aper
0.22
isc
0.22
iven
0.21
inking
0.20
infeld
0.20
hab
0.20
ifting
0.19
Dr
0.19
illing
0.18
utex
0.18
Activations Density 0.034%