INDEX
    Explanations

    equals sign

    New Auto-Interp
    Negative Logits
    DU
    -0.07
    -0.07
    etto
    -0.07
    ini
    -0.07
    uti
    -0.07
     Ted
    -0.07
    Bob
    -0.07
    ROOT
    -0.07
    joe
    -0.07
    -0.07
    POSITIVE LOGITS
     iid
    0.09
     RHS
    0.08
     zid
    0.08
    0.08
     communications
    0.08
     그렇
    0.08
     validade
    0.07
     Communications
    0.07
     iva
    0.07
     igualdade
    0.07
    Act Density 0.069%

    No Known Activations