INDEX
Explanations
contractions and informal language
references to collective experiences and social dynamics
New Auto-Interp
Negative Logits
Emerson
-0.77
Binding
-0.70
Lyons
-0.69
Seymour
-0.69
Bender
-0.67
Alpine
-0.66
pires
-0.64
Keynes
-0.63
Salmon
-0.63
Swansea
-0.63
POSITIVE LOGITS
don
1.19
didn
1.18
doesn
1.15
aren
1.13
didn
1.12
shouldn
1.12
ain
1.12
wouldn
1.10
DON
1.08
don
1.01
Activations Density 0.273%