INDEX
    Explanations

    phrases that express guiding principles or mottos

    New Auto-Interp
    Negative Logits
    iei
    -0.16
    oje
    -0.15
     Mention
    -0.15
     إد
    -0.15
    λικ
    -0.15
    abar
    -0.15
    mention
    -0.14
    stery
    -0.14
    äch
    -0.14
    ráf
    -0.14
    POSITIVE LOGITS
     motto
    0.55
     slogan
    0.53
     theme
    0.47
    theme
    0.40
     Theme
    0.38
     slogans
    0.37
    logan
    0.35
    Theme
    0.35
     themes
    0.34
     THEME
    0.32
    Act Density 0.264%

    No Known Activations