INDEX
    Explanations

    proper nouns, particularly names and places

    New Auto-Interp
    Negative Logits
    krom
    -0.08
    phis
    -0.07
    imb
    -0.06
    oe
    -0.06
    APS
    -0.06
    aps
    -0.06
    DMI
    -0.06
    ä¿Ĺ
    -0.06
    erset
    -0.06
    htag
    -0.06
    POSITIVE LOGITS
    LLU
    0.08
     itself
    0.07
    uiltin
    0.07
    İS
    0.07
    гоÑĢ
    0.06
    enschaft
    0.06
     Gors
    0.06
     Jenner
    0.06
    ollipop
    0.06
    iolet
    0.06
    Act Density 0.002%

    No Known Activations