INDEX
    Explanations

    Prepositions and adverbs

    New Auto-Interp
    Negative Logits
     were
    -0.07
     ignore
    -0.07
    C
    -0.07
     zenith
    -0.06
    <Role
    -0.06
    ssue
    -0.06
     myList
    -0.06
    ñas
    -0.06
    .left
    -0.06
    .gradle
    -0.06
    POSITIVE LOGITS
     gut
    0.07
     snag
    0.07
     πρώτη
    0.07
     الإن
    0.06
     wichtig
    0.06
     snork
    0.06
    mıyor
    0.06
     thuận
    0.06
    ificate
    0.06
    QUI
    0.06
    Act Density 0.143%

    No Known Activations