INDEX
    Explanations

    themes related to belief systems and societal divisions

    New Auto-Interp
    Negative Logits
    ạ
    -0.13
    segue
    -0.13
    edback
    -0.12
    zar
    -0.12
    amac
    -0.12
    Âł
    -0.11
    byn
    -0.11
     unin
    -0.11
    ane
    -0.11
    ünüz
    -0.11
    POSITIVE LOGITS
     in
    0.65
    åľ¨
    0.44
     în
    0.43
     ÙģÙĬ
    0.39
     åľ¨
    0.39
    à¹ĥà¸Ļ
    0.39
     ÙģÙī
    0.32
     در
    0.30
    ï¼Įåľ¨
    0.30
     à¹ĥà¸Ļ
    0.30
    Act Density 0.322%

    No Known Activations