INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Chwiliwch
    -0.62
     noten
    -0.61
     مرئيه
    -0.60
    StoreMessageInfo
    -0.58
    LEncoder
    -0.56
     Paglinawan
    -0.56
    GARET
    -0.54
    UMBUS
    -0.54
     distanciation
    -0.54
    glement
    -0.54
    POSITIVE LOGITS
     disagre
    1.42
     inappro
    1.40
     apprehen
    1.40
     emphat
    1.34
     uninten
    1.31
     impra
    1.30
     inconce
    1.29
     unspeak
    1.28
     increa
    1.27
     depic
    1.25
    Act Density 0.091%

    No Known Activations