INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sponsorship
    -0.08
     sollten
    -0.07
     chụp
    -0.07
    document
    -0.07
     pane
    -0.07
     conco
    -0.06
     tutte
    -0.06
     navr
    -0.06
     memoria
    -0.06
     theatre
    -0.06
    POSITIVE LOGITS
    Equality
    0.10
     equality
    0.09
     equal
    0.09
     Equal
    0.09
     Equality
    0.08
     EQUAL
    0.08
     inequality
    0.08
     EQ
    0.07
    Equal
    0.07
    0.07
    Act Density 0.033%

    No Known Activations