INDEX
Explanations
phrases emphasizing the need to take a step back and reflect
New Auto-Interp
Negative Logits
è§£
-0.17
erg
-0.15
partially
-0.14
icas
-0.14
discard
-0.14
prepare
-0.13
Prepared
-0.13
пÑĸÑģ
-0.13
Warm
-0.13
idia
-0.13
POSITIVE LOGITS
pause
0.45
pa
0.44
paused
0.39
Pause
0.37
pauses
0.35
pa
0.34
Pause
0.33
reflect
0.32
Reflect
0.32
pause
0.32
Activations Density 0.044%