INDEX
    Explanations

    words related to opinions or comments on various topics

    instances of objectionable or controversial topics

    New Auto-Interp
    Negative Logits
    )"
    -0.71
     however
    -0.69
    )'
    -0.68
     '[
    -0.67
     moreover
    -0.67
    )</
    -0.65
     |--
    -0.64
     meanwhile
    -0.63
     depends
    -0.62
     *)
    -0.61
    POSITIVE LOGITS
    boro
    0.58
    LOS
    0.57
    etime
    0.56
    toggle
    0.54
    renheit
    0.53
    éĹ
    0.53
    DCS
    0.52
     gallons
    0.51
    Welcome
    0.51
    éĸ
    0.50
    Act Density 2.367%

    No Known Activations