INDEX
    Explanations

    notable references to literature, specifically titles and significant terms related to George Orwell's works

    New Auto-Interp
    Negative Logits
    ilename
    -0.17
    lesc
    -0.15
    åĽ
    -0.15
    ç±
    -0.14
    ानम
    -0.14
    ámara
    -0.14
    .asc
    -0.14
    ále
    -0.14
    à¸ļร
    -0.14
    orado
    -0.14
    POSITIVE LOGITS
     Im
    0.20
    Im
    0.19
    im
    0.18
     im
    0.18
    elon
    0.17
    lew
    0.17
     им
    0.17
    /im
    0.17
    imb
    0.16
    IM
    0.16
    Act Density 0.010%

    No Known Activations