INDEX
Explanations
phrases related to legal or criminal actions and consequences
sentence-ending punctuation, suggesting a focus on the conclusion of statements
New Auto-Interp
Negative Logits
cius
-0.77
agar
-0.75
boro
-0.73
erville
-0.68
proced
-0.63
Rebell
-0.62
ocial
-0.61
¯
-0.60
Alphabet
-0.60
Quit
-0.60
POSITIVE LOGITS
Among
0.74
Yesterday
0.73
Actor
0.73
Asked
0.73
Specifically
0.72
NPR
0.72
Eight
0.72
WHO
0.71
Earlier
0.71
Exc
0.71
Activations Density 0.279%