INDEX
Explanations
quotes within double quotation marks
New Auto-Interp
Negative Logits
adjud
-0.80
favor
-0.78
arch
-0.77
prec
-0.74
spr
-0.73
grid
-0.72
scheduled
-0.72
pir
-0.72
prelim
-0.70
ranking
-0.70
POSITIVE LOGITS
We
1.74
It
1.65
They
1.62
There
1.62
Our
1.61
I
1.58
Because
1.55
Everybody
1.53
Nobody
1.53
You
1.52
Activations Density 1.682%