INDEX
Explanations
instances of the word "stopped"
New Auto-Interp
Negative Logits
é¾įåĸļ士
-0.74
aths
-0.70
Coliseum
-0.68
ighth
-0.68
arov
-0.67
Sov
-0.66
arden
-0.65
eer
-0.64
rocket
-0.62
adier
-0.60
POSITIVE LOGITS
bothering
1.03
abruptly
0.94
watch
0.92
gap
0.84
breathing
0.83
watching
0.81
raining
0.78
blinking
0.77
worrying
0.76
laughing
0.75
Activations Density 0.025%