INDEX
    Explanations

    terms related to various philosophical and ideological positions

    New Auto-Interp
    Negative Logits
    er
    -0.22
    izer
    -0.17
    nier
    -0.16
    an
    -0.15
    æĿī
    -0.15
    smith
    -0.15
    umas
    -0.15
    度
    -0.15
    anine
    -0.15
    thon
    -0.15
    POSITIVE LOGITS
    (ic
    0.22
    ically
    0.21
    ische
    0.21
    -leaning
    0.21
    otle
    0.20
    isches
    0.17
    ycz
    0.16
    lero
    0.16
    иÑĩ
    0.16
    ischer
    0.15
    Act Density 0.084%

    No Known Activations