INDEX
Explanations
quotations within text
New Auto-Interp
Negative Logits
parting
-0.80
pudding
-0.78
favor
-0.77
secret
-0.70
dispar
-0.70
classified
-0.70
landslide
-0.70
bunk
-0.69
dominate
-0.69
friendly
-0.69
POSITIVE LOGITS
It
1.48
We
1.46
They
1.44
There
1.42
Because
1.39
Especially
1.37
Sometimes
1.36
I
1.35
But
1.35
Obviously
1.34
Activations Density 1.058%