INDEX
    Explanations

    instances of the word "watch" in various forms

    New Auto-Interp
    Negative Logits
     Roots
    -0.42
    Roots
    -0.41
     grounded
    -0.40
    "}")
    -0.39
     grounding
    -0.38
    eip
    -0.38
     gyhoeddwyd
    -0.38
    Rohan
    -0.37
     Grounds
    -0.37
     Damian
    -0.36
    POSITIVE LOGITS
     watching
    1.11
     watched
    1.03
     WATCH
    1.02
     Watching
    1.01
    Watching
    0.99
     watch
    0.98
     Watched
    0.97
     Watch
    0.96
    watched
    0.95
    Watch
    0.92
    Act Density 0.010%

    No Known Activations