INDEX
Explanations
quotations within written texts
quotation marks and dialogue in the text
New Auto-Interp
Negative Logits
developmental
-0.76
metic
-0.75
derby
-0.71
killer
-0.71
shell
-0.70
accomp
-0.69
sympath
-0.68
carrier
-0.68
cancell
-0.67
destro
-0.66
POSITIVE LOGITS
there
1.16
we
1.12
trust
1.09
straight
1.08
Jews
1.08
I
1.08
double
1.04
every
1.03
because
1.03
nothing
1.01
Activations Density 0.101%