INDEX
    Explanations

    references to URLs and file paths, particularly related to code repositories

    New Auto-Interp
    Negative Logits
    iesta
    -0.17
    entine
    -0.16
    oric
    -0.15
    hma
    -0.14
     McLaren
    -0.14
    ç±į
    -0.14
    erus
    -0.14
    oller
    -0.14
     dump
    -0.14
     lorem
    -0.14
    POSITIVE LOGITS
     badge
    0.25
    badge
    0.23
     CI
    0.22
    .bad
    0.22
    -badge
    0.21
     Badge
    0.21
    BAD
    0.21
    _bad
    0.21
     shields
    0.20
     badges
    0.20
    Act Density 0.006%

    No Known Activations