INDEX
Explanations
repetitive or nonsensical phrases that seem out of context or unusual in the text
New Auto-Interp
Negative Logits
orah
-0.82
ossibility
-0.79
ittee
-0.73
owler
-0.70
fecture
-0.69
ocese
-0.68
leneck
-0.68
utenant
-0.68
former
-0.67
aucus
-0.67
POSITIVE LOGITS
amounts
1.38
quantities
1.28
versions
1.23
doses
1.15
solutions
1.13
scenarios
1.13
messages
1.09
situations
1.08
ones
1.07
things
1.07
Activations Density 0.307%