INDEX
Explanations
quotations within the text
New Auto-Interp
Negative Logits
favor
-0.82
parting
-0.81
slam
-0.80
pudding
-0.79
adjud
-0.78
periodic
-0.74
brid
-0.74
classified
-0.74
prec
-0.74
powerhouse
-0.74
POSITIVE LOGITS
We
1.59
It
1.59
They
1.56
Because
1.53
I
1.51
There
1.50
Obviously
1.47
Nobody
1.45
Our
1.44
You
1.44
Activations Density 0.484%