INDEX
    Explanations

    references to societal conditioning and manipulation

    New Auto-Interp
    Negative Logits
    ongan
    -0.17
     Bobby
    -0.17
    idges
    -0.16
    üs
    -0.15
    idenav
    -0.15
    fon
    -0.15
     saf
    -0.14
    unn
    -0.14
     benef
    -0.14
    Forms
    -0.14
    POSITIVE LOGITS
    PELL
    0.17
    aven
    0.15
    kola
    0.14
    ippi
    0.14
    ofs
    0.14
    à¸ķลà¸Ńà¸Ķ
    0.13
     Bilim
    0.13
    füg
    0.13
    Shar
    0.13
    .circular
    0.13
    Act Density 0.164%

    No Known Activations