INDEX
Explanations
mentions of medical professionals, specifically doctors
the presence of a notable professional title or name in a text
New Auto-Interp
Negative Logits
sorting
-0.74
silence
-0.74
eur
-0.73
distribut
-0.72
disag
-0.71
segregation
-0.71
separation
-0.70
divest
-0.69
optics
-0.68
fr
-0.66
POSITIVE LOGITS
gencies
0.80
apo
0.79
imon
0.75
hov
0.71
imov
0.71
uble
0.70
idia
0.69
ãĥ¤
0.68
yip
0.65
orders
0.65
Activations Density 0.000%