INDEX
Explanations
instances where things are being prevented or prohibited
instances of the word "stop" and its variations in various contexts
New Auto-Interp
Negative Logits
Sov
-0.86
iosyncr
-0.80
ammy
-0.78
ighth
-0.77
ographies
-0.76
orth
-0.76
orthy
-0.75
ramid
-0.75
olesc
-0.70
ocene
-0.70
POSITIVE LOGITS
bothering
0.93
gap
0.91
raining
0.87
bleeding
0.83
smoking
0.83
watching
0.78
watch
0.77
worrying
0.76
cheating
0.75
trafficking
0.74
Activations Density 0.038%