INDEX
Explanations
phrases related to conditions, tasks, or actions that can potentially have consequences or outcomes
pronouns indicating choice and agency
New Auto-Interp
Negative Logits
Forth
-0.71
odore
-0.67
wikipedia
-0.62
Paraly
-0.62
Ammunition
-0.60
Resurrection
-0.60
Journals
-0.60
Orbital
-0.59
ption
-0.59
Eleven
-0.59
POSITIVE LOGITS
were
0.75
weren
0.74
're
0.73
ammed
0.71
utm
0.70
stray
0.68
ierrez
0.67
regul
0.65
arer
0.65
subscribed
0.65
Activations Density 0.196%