INDEX
    Explanations

    phrases expressing skepticism or criticism of societal norms and practices

    New Auto-Interp
    Negative Logits
    ochen
    -0.15
     Succ
    -0.15
    zek
    -0.14
    styl
    -0.14
    zon
    -0.14
    ctors
    -0.14
    å°¾
    -0.14
    обÑĢаÐ
    -0.14
    .nlm
    -0.13
    itra
    -0.13
    POSITIVE LOGITS
     mere
    0.17
    alone
    0.15
    mere
    0.15
     èĢĮ
    0.14
     Hol
    0.14
    ingt
    0.14
    KF
    0.14
     nor
    0.14
    umont
    0.14
    anni
    0.14
    Act Density 0.261%

    No Known Activations