INDEX
    Explanations

    nouns and phrases that denote positive attributes or endorsements

    New Auto-Interp
    Negative Logits
     proto
    -0.16
     Jeh
    -0.15
     Proto
    -0.15
    .gb
    -0.14
     Pike
    -0.14
     prot
    -0.14
    blink
    -0.14
    oucher
    -0.14
    peat
    -0.14
    enger
    -0.14
    POSITIVE LOGITS
    assen
    0.14
    dorf
    0.14
     lateral
    0.14
    abcdefgh
    0.13
    china
    0.13
    inar
    0.13
    اس
    0.13
    yst
    0.13
     heterosexual
    0.13
    ↵↵
    0.13
    Act Density 0.075%

    No Known Activations