INDEX
Explanations
phrases related to works or functioning
the repeated mention of the word "Works" in various contexts
New Auto-Interp
Negative Logits
limb
-0.75
imply
-0.66
taboo
-0.63
intr
-0.62
jay
-0.62
ent
-0.62
bone
-0.62
uca
-0.62
chasing
-0.60
pr
-0.60
POSITIVE LOGITS
Works
3.99
Works
2.40
works
2.34
WORK
1.77
works
1.72
Work
1.62
WORK
1.51
Plays
1.27
Work
1.26
Writ
1.24
Activations Density 0.013%