INDEX
    Explanations

    instances of the word "I," indicating a focus on personal perspectives or self-references

    New Auto-Interp
    Negative Logits
     Sim
    -0.18
    Sim
    -0.16
    ansi
    -0.16
    ota
    -0.15
     forms
    -0.15
    858
    -0.15
    uy
    -0.15
     Men
    -0.15
    aken
    -0.14
     San
    -0.14
    POSITIVE LOGITS
    gie
    0.17
    lic
    0.17
    bler
    0.17
    ilon
    0.16
    aul
    0.15
    _CN
    0.15
    opic
    0.15
    #
    0.15
    ãĥĭãĥĥãĤ¯
    0.15
    mai
    0.14
    Act Density 0.018%

    No Known Activations