INDEX
Explanations
patterns matching the structure 'X after Y'
elements related to transitions or changes in states
New Auto-Interp
Negative Logits
arb
-0.70
utable
-0.70
ortion
-0.67
fit
-0.60
irable
-0.60
metic
-0.59
agon
-0.58
UX
-0.57
Kits
-0.57
ITE
-0.57
POSITIVE LOGITS
AFTER
1.60
before
1.58
before
1.57
after
1.51
after
1.51
BEFORE
1.48
afterward
1.41
afterwards
1.36
Before
1.26
After
1.25
Activations Density 0.204%