INDEX
    Explanations

    Proper nouns/names

    New Auto-Interp
    Negative Logits
    or
    -0.07
    اصر
    -0.07
    =====↵
    -0.07
    ***/↵
    -0.06
    ابل
    -0.06
    わず
    -0.06
    قدر
    -0.06
     بول
    -0.06
     tac
    -0.06
     яка
    -0.06
    POSITIVE LOGITS
     surrender
    0.07
     Manifest
    0.07
    отреб
    0.07
     lift
    0.07
    eryl
    0.07
    ριος
    0.07
     substitution
    0.07
     sprink
    0.07
     insecurity
    0.06
     Attributes
    0.06
    Act Density 0.241%

    No Known Activations