INDEX
Explanations
contraction for 'is' or 'was'
New Auto-Interp
Negative Logits
',
0.42
怆
0.39
ldef
0.39
uka
0.38
"',
0.38
axon
0.38
ênd
0.37
lleve
0.37
屹
0.37
\
0.37
POSITIVE LOGITS
asleep
0.63
complaining
0.59
jealous
0.57
snoring
0.56
impatient
0.55
homophobic
0.54
unwell
0.52
allergic
0.51
schizophren
0.51
rumoured
0.51
Activations Density 0.008%