INDEX
Explanations
references to people and groups
"Those" followed by relative pronouns or prepositions
those followed by who or a group
New Auto-Interp
Negative Logits
varandra
-0.66
rivista
-0.65
nuages
-0.65
lèvres
-0.64
pères
-0.64
prisonniers
-0.59
itſelf
-0.59
wikipagina
-0.59
cdti
-0.57
первых
-0.57
POSITIVE LOGITS
who
1.25
pesky
0.88
involved
0.87
that
0.85
whose
0.78
same
0.78
genen
0.76
few
0.75
of
0.74
responsible
0.72
Activations Density 0.080%