INDEX
Explanations
verbs that involve halting or preventing something
repeated calls to take action or halt negative behaviors
New Auto-Interp
Negative Logits
Sov
-0.82
ammy
-0.77
ighth
-0.75
olesc
-0.74
ault
-0.71
aths
-0.69
ety
-0.69
VERTISEMENT
-0.69
uth
-0.67
orthy
-0.67
POSITIVE LOGITS
gap
0.95
watching
0.93
watch
0.93
bothering
0.86
stopping
0.76
breathing
0.75
smoking
0.75
reon
0.75
blinking
0.73
door
0.71
Activations Density 0.025%