INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     WORD
    -0.07
     Ci
    -0.07
     Associ
    -0.07
    Consum
    -0.07
     Nghị
    -0.07
    -0.06
     criticism
    -0.06
    ورد
    -0.06
    هور
    -0.06
    mel
    -0.06
    POSITIVE LOGITS
    _receive
    0.07
     RuntimeObject
    0.06
     Frankie
    0.06
    からない
    0.06
    (ins
    0.06
    _slot
    0.06
    /rules
    0.06
     Swamp
    0.06
     Wonderland
    0.06
     бур
    0.06
    Act Density 0.002%

    No Known Activations