INDEX
Explanations
phrases related to events or actions happening before a specific point in time
New Auto-Interp
Negative Logits
twisting
-0.68
humour
-0.68
charming
-0.68
snaps
-0.68
drag
-0.67
laugh
-0.65
beautiful
-0.65
trick
-0.65
adorable
-0.65
comedy
-0.64
POSITIVE LOGITS
Prior
3.68
Prior
2.82
Previously
1.56
prior
1.43
Previous
1.30
Before
1.23
IOR
1.13
Earlier
1.05
Priority
1.04
Previous
1.03
Activations Density 0.020%