INDEX
Explanations
instructions or guidelines that emphasize consistency and frequency in actions or behaviors
always / never
New Auto-Interp
Negative Logits
Said
-0.45
propOrder
-0.43
Soon
-0.43
Said
-0.41
läng
-0.41
maybe
-0.40
NR
-0.40
Initial
-0.40
Various
-0.39
impati
-0.39
POSITIVE LOGITS
ALWAYS
0.81
always
0.81
always
0.79
ALWAYS
0.74
Always
0.73
Always
0.71
deauna
0.69
sempre
0.68
zawsze
0.68
alltid
0.67
Activations Density 0.026%