INDEX
    Explanations

    proper nouns and specific names

    New Auto-Interp
    Negative Logits
    arme
    -0.15
    aze
    -0.15
    esktop
    -0.14
    azes
    -0.14
    ackages
    -0.14
    ¿ł
    -0.14
    acent
    -0.14
    aces
    -0.14
    outer
    -0.14
    iard
    -0.14
    POSITIVE LOGITS
     swe
    0.18
    andard
    0.17
    hti
    0.16
    hen
    0.15
    bjerg
    0.15
    idan
    0.15
    hed
    0.15
    олÑĸ
    0.15
     Swe
    0.15
    andra
    0.15
    Act Density 0.019%

    No Known Activations