INDEX
Explanations
quoted statements within a text
New Auto-Interp
Negative Logits
arch
-0.78
adjud
-0.72
prelim
-0.70
favor
-0.70
repro
-0.70
listed
-0.69
disappro
-0.68
scheduled
-0.68
separated
-0.68
vault
-0.68
POSITIVE LOGITS
We
1.56
Our
1.48
They
1.45
It
1.45
There
1.45
Everybody
1.44
Everything
1.40
Nobody
1.40
I
1.38
People
1.37
Activations Density 0.326%