INDEX
    Explanations

    inclusion words

    New Auto-Interp
    Negative Logits
     was
    -0.07
    GORITH
    -0.07
     فران
    -0.07
     Berk
    -0.07
     Thou
    -0.06
     alma
    -0.06
     erfolgreich
    -0.06
    -0.06
     upright
    -0.06
    CGColor
    -0.06
    POSITIVE LOGITS
    áln
    0.07
     Lime
    0.06
     Thickness
    0.06
    itet
    0.06
    <C
    0.06
    θε
    0.06
    asc
    0.06
     McC
    0.06
    vise
    0.06
    ุม
    0.06
    Act Density 0.081%

    No Known Activations