INDEX
Explanations
phrases indicating relationships, conditions, or nuances in research findings
New Auto-Interp
Negative Logits
them
-0.63
Its
-0.55
Everything
-0.53
THEM
-0.53
thats
-0.52
You
-0.52
writeField
-0.51
They
-0.51
didn
-0.51
Them
-0.50
POSITIVE LOGITS
there
1.05
considerable
0.98
care
0.93
significant
0.89
substantial
0.85
certain
0.84
additional
0.84
careful
0.82
some
0.81
differences
0.81
Activations Density 2.125%