INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rawtypes
    -0.43
     Ten
    -0.42
    <eos>
    -0.42
    wo
    -0.40
     Wayback
    -0.40
    ]")]
    -0.39
    ark
    -0.38
     Zus
    -0.36
     Sands
    -0.36
     Wo
    -0.36
    POSITIVE LOGITS
     myſelf
    0.76
     itſelf
    0.75
     ſtill
    0.75
     ſtand
    0.72
     ſhe
    0.71
     raiſ
    0.68
     himſelf
    0.68
     ſte
    0.66
    🏻‍♀️
    0.66
     alſo
    0.65
    Act Density 0.002%

    No Known Activations