INDEX
Explanations
mentions of intense or extreme situations, often negative
quotations in the text
New Auto-Interp
Negative Logits
overlook
-0.71
favor
-0.66
scheduled
-0.65
shop
-0.65
lookout
-0.64
derby
-0.64
disapprove
-0.64
termin
-0.64
tro
-0.63
adjud
-0.63
POSITIVE LOGITS
They
1.27
We
1.25
Our
1.15
It
1.13
I
1.12
There
1.12
Where
1.12
Sometimes
1.10
Too
1.10
Because
1.09
Activations Density 0.103%