INDEX
    Explanations

    specific names and references to notable figures or collaborations

    New Auto-Interp
    Negative Logits
    Å
    -0.17
    -UA
    -0.17
    ua
    -0.15
    hart
    -0.15
    UA
    -0.15
    agas
    -0.15
    arie
    -0.15
    iž
    -0.14
    adora
    -0.14
    acimiento
    -0.14
    POSITIVE LOGITS
     Fal
    0.22
     Dav
    0.20
    Fal
    0.19
    flows
    0.18
     Bracket
    0.17
     Basket
    0.17
     Sey
    0.17
     Burn
    0.17
     Ski
    0.17
     Freeze
    0.17
    Act Density 0.012%

    No Known Activations