INDEX
    Explanations

    themes related to societal priorities and values

    New Auto-Interp
    Negative Logits
    onas
    -0.14
    -↵
    -0.14
    isko
    -0.13
    Č
    -0.13
    engin
    -0.13
    -↵↵
    -0.13
    ipe
    -0.12
     GOODMAN
    -0.12
    iÅŁ
    -0.12
     ìŀĪê³ł
    -0.12
    POSITIVE LOGITS
    :
    0.66
    ा:
    0.37
    ï¼ļ
    0.33
    à¹Į:
    0.33
    *:
    0.30
     :
    0.29
    :**
    0.27
    $:
    0.27
    +:
    0.25
     namely
    0.25
    Act Density 0.731%

    No Known Activations