INDEX
    Explanations

    the name "Ross" with a very strong activation

    mentions of the name "Ross."

    New Auto-Interp
    Negative Logits
    rious
    -0.74
     à¨
    -0.73
    brance
    -0.69
    urated
    -0.67
    ACTED
    -0.67
    lder
    -0.65
    ulhu
    -0.64
     conspicuous
    -0.64
    undai
    -0.64
    ع
    -0.63
    POSITIVE LOGITS
    bach
    1.02
    etti
    0.96
    inson
    0.95
    olini
    0.87
    lyn
    0.86
    aunders
    0.86
    andowski
    0.84
    iter
    0.80
    endale
    0.79
    ys
    0.78
    Act Density 0.019%

    No Known Activations