INDEX
    Explanations

    phrases indicating social dynamics or societal conditions

    New Auto-Interp
    Negative Logits
    ÐķС
    -0.16
    ients
    -0.15
    atsu
    -0.15
    /lg
    -0.14
    leta
    -0.14
    (iOS
    -0.14
    UNKNOWN
    -0.14
    unknown
    -0.14
    eliac
    -0.14
     unknown
    -0.14
    POSITIVE LOGITS
     instead
    0.48
    instead
    0.44
     rather
    0.37
     Instead
    0.36
    Instead
    0.34
     вмеÑģÑĤ
    0.33
     statt
    0.30
     Rather
    0.29
    rather
    0.29
    Rather
    0.26
    Act Density 0.008%

    No Known Activations