INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     polygon
    -0.07
    .ge
    -0.07
     مار
    -0.06
    iations
    -0.06
     bütün
    -0.06
     pity
    -0.06
    _song
    -0.06
    -0.06
     Yen
    -0.06
    ischer
    -0.06
    POSITIVE LOGITS
     smear
    0.07
    nání
    0.07
    20
    0.07
    .Forms
    0.07
     sonra
    0.07
    íst
    0.07
     그리고
    0.07
     Gson
    0.06
    MOVE
    0.06
     psychiatrist
    0.06
    Act Density 0.002%

    No Known Activations