INDEX
Explanations
specific instructions or prohibitions
phrases containing the word "that" in various contexts, indicating a focus on specifying conditions or details
New Auto-Interp
Negative Logits
————
-0.61
whisk
-0.60
ointed
-0.57
apsed
-0.57
ahime
-0.55
eely
-0.54
etched
-0.54
realizing
-0.54
emon
-0.54
Hope
-0.53
POSITIVE LOGITS
violates
1.23
exceeds
1.21
involves
1.16
contradicts
1.14
occurs
1.08
disagrees
1.08
qualifies
1.06
isn
1.05
doesn
1.05
satisfies
1.01
Activations Density 0.156%