INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    undai
    -0.83
     destro
    -0.70
     necessities
    -0.68
     conduc
    -0.68
     plur
    -0.65
     utilitarian
    -0.65
    uter
    -0.65
    mber
    -0.64
    ogram
    -0.64
     restraints
    -0.64
    POSITIVE LOGITS
    @#&
    1.57
    #$
    1.23
    ?!
    1.16
    @#
    1.10
     :)
    1.05
    ãĢį
    1.02
    [/
    1.00
     ;)
    0.97
     ðŁĺ
    0.96
     :-)
    0.94
    Act Density 0.313%

    No Known Activations