INDEX
    Explanations

    negations and evaluative expressions

    New Auto-Interp
    Negative Logits
    ÑģÑĤи
    -0.16
    528
    -0.15
     Boss
    -0.14
    ARS
    -0.14
     záp
    -0.14
    ulario
    -0.13
    drs
    -0.13
    eras
    -0.13
    ario
    -0.13
    ayed
    -0.13
    POSITIVE LOGITS
    θμ
    0.16
    oya
    0.15
     Patterson
    0.15
    ugu
    0.15
    legg
    0.14
    captures
    0.14
    ugins
    0.14
    ownik
    0.14
     SetProperty
    0.14
    ëŀĢ
    0.14
    Act Density 0.000%

    No Known Activations