INDEX
Explanations
phrases prompting urgent action
mandated actions or obligations
New Auto-Interp
Negative Logits
Vish
-0.75
ité
-0.74
Baldwin
-0.73
Gw
-0.71
itar
-0.68
anka
-0.67
Kut
-0.67
Bhar
-0.66
Upton
-0.66
Mushroom
-0.65
POSITIVE LOGITS
MUST
1.09
OSE
0.95
SHOULD
0.94
ELY
0.89
LECT
0.89
WATCH
0.89
HAEL
0.89
ATCH
0.88
Must
0.85
FILE
0.84
Activations Density 0.005%