INDEX
    Explanations

    occurrences of the word "I" and its related forms, indicating a focus on personal experiences or perspectives

    New Auto-Interp
    Negative Logits
    pong
    -0.14
    wand
    -0.14
     AppState
    -0.14
    fans
    -0.13
    ök
    -0.13
    lou
    -0.13
    -us
    -0.13
    APPER
    -0.13
    WARDED
    -0.13
    lk
    -0.13
    POSITIVE LOGITS
    ample
    0.17
       
    0.16
     Morav
    0.15
     alike
    0.15
    ager
    0.14
    ihil
    0.14
    HAL
    0.14
    istr
    0.14
    SBATCH
    0.14
    .MouseAdapter
    0.13
    Act Density 0.015%

    No Known Activations