INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    hower
    -0.74
    hound
    -0.73
     "$:/
    -0.70
    ļéĨĴ
    -0.70
    EStream
    -0.62
     friendly
    -0.60
     thumbs
    -0.60
     limited
    -0.59
     deaf
    -0.58
    ãĥ¼ãĥ³
    -0.58
    POSITIVE LOGITS
    rict
    1.22
    alker
    1.18
    amped
    1.14
    uffed
    1.12
    onew
    1.10
    itched
    1.10
    okes
    1.10
    oppable
    1.06
    amps
    1.06
    amping
    1.06
    Act Density 0.422%

    No Known Activations