INDEX
Explanations
instances where someone's job, living conditions, or personal life are described in detail
repeated uses of the word "was."
New Auto-Interp
Negative Logits
Ê
-0.64
cair
-0.64
Applicant
-0.62
Pair
-0.60
inctions
-0.60
gered
-0.58
thora
-0.56
thous
-0.56
Sapp
-0.55
ulic
-0.54
POSITIVE LOGITS
ivas
0.74
senal
0.67
esthetic
0.66
anium
0.65
advertising
0.62
pawn
0.62
been
0.60
Morning
0.59
likely
0.58
oute
0.56
Activations Density 0.250%