INDEX
Explanations
instances of the word "watch" in various forms
variations of watch
New Auto-Interp
Negative Logits
Roots
-0.42
Roots
-0.41
grounded
-0.40
"}")
-0.39
grounding
-0.38
eip
-0.38
gyhoeddwyd
-0.38
Rohan
-0.37
Grounds
-0.37
Damian
-0.36
POSITIVE LOGITS
watching
1.11
watched
1.03
WATCH
1.02
Watching
1.01
Watching
0.99
watch
0.98
Watched
0.97
Watch
0.96
watched
0.95
Watch
0.92
Activations Density 0.010%