INDEX
Explanations
avoid reckless or wasteful actions
New Auto-Interp
Negative Logits
螨
0.41
蘚
0.40
修订
0.40
ignés
0.40
四季
0.38
闡
0.37
puna
0.36
RECTION
0.36
redact
0.36
configs
0.36
POSITIVE LOGITS
flashy
0.70
brute
0.69
greedy
0.60
reckless
0.60
recklessly
0.60
erratic
0.59
impatient
0.58
wasteful
0.57
impatience
0.56
chasing
0.54
Activations Density 0.013%