INDEX
Explanations
phrases related to individuals with medical or academic titles
New Auto-Interp
Negative Logits
chorus
-0.67
CHAT
-0.66
quart
-0.65
Territories
-0.64
curfew
-0.63
spirited
-0.63
routed
-0.61
compens
-0.60
rout
-0.60
coupling
-0.60
POSITIVE LOGITS
inks
1.11
unks
1.07
umin
1.05
inker
1.05
illing
1.04
inking
1.02
astically
1.00
ifts
0.98
ifting
0.98
herty
0.97
Activations Density 0.255%