INDEX
Explanations
names of specific entities or figures
New Auto-Interp
Negative Logits
beware
-0.77
patiently
-0.71
aloud
-0.70
forcefully
-0.69
delivered
-0.69
partake
-0.69
thood
-0.69
stopping
-0.67
recite
-0.66
perse
-0.66
POSITIVE LOGITS
oret
1.28
atre
1.24
odor
1.19
Hague
1.17
resa
1.14
orem
1.09
Economist
1.06
Simpsons
1.06
odore
1.03
Guardian
1.02
Activations Density 0.463%