INDEX
    Explanations

    the word "anyone" with a high activation value

    mention of the word "anyone."

    New Auto-Interp
    Negative Logits
     Nanto
    -0.66
     Barg
    -0.62
     Congo
    -0.61
     Cast
    -0.61
    efficient
    -0.60
    ritz
    -0.59
    pa
    -0.59
     Maze
    -0.59
     Hound
    -0.59
     Labor
    -0.58
    POSITIVE LOGITS
     else
    1.51
    THING
    1.24
    Else
    1.10
    omever
    0.99
     Else
    0.99
    soever
    0.97
    else
    0.95
    ĪĴ
    0.88
    zzle
    0.87
     imaginable
    0.83
    Act Density 0.016%

    No Known Activations