INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    τς
    -0.07
    BJ
    -0.07
     vomiting
    -0.07
    '";↵
    -0.06
    フォ
    -0.06
    -suite
    -0.06
     simplified
    -0.06
     Choir
    -0.06
     attribute
    -0.06
     nurs
    -0.06
    POSITIVE LOGITS
     chaque
    0.06
     elementType
    0.06
    posted
    0.06
     церкви
    0.06
    _general
    0.06
    ニニニニ
    0.06
     Theft
    0.06
     burge
    0.06
    )o
    0.06
    daş
    0.06
    Act Density 0.013%

    No Known Activations