INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Vie
    -0.78
     Gors
    -0.74
     destro
    -0.70
     Sylv
    -0.70
     cler
    -0.66
     disturbed
    -0.65
     foremost
    -0.65
     Rodrigo
    -0.65
     jog
    -0.64
     Shiva
    -0.64
    POSITIVE LOGITS
    define
    1.29
    DIV
    1.24
    !/
    1.24
    ################################
    1.14
    include
    1.13
    ########
    1.09
    Gamer
    1.07
    ###
    1.06
    DonaldTrump
    0.96
    NAME
    0.96
    Act Density 0.022%

    No Known Activations