INDEX
    Explanations

    words related to contrasting actions or concepts

    phrases indicating a distinction between individual actions and societal influences

    New Auto-Interp
    Negative Logits
     WATCHED
    -0.77
    mun
    -0.69
    ļéĨĴ
    -0.68
    uel
    -0.60
    Status
    -0.59
    Ͻ
    -0.59
     Flavoring
    -0.58
    tenance
    -0.57
     Strongh
    -0.57
     periodically
    -0.56
    POSITIVE LOGITS
     necessarily
    0.83
     ones
    0.80
     nor
    0.80
     slightest
    0.76
     mention
    0.70
     anything
    0.68
     anymore
    0.66
     anywhere
    0.65
    YP
    0.65
    zes
    0.63
    Act Density 0.161%

    No Known Activations