INDEX
    Explanations

    origin/source

    New Auto-Interp
    Negative Logits
    edii
    -0.07
    .place
    -0.06
    idlo
    -0.06
    Lemma
    -0.06
    kud
    -0.06
    _erase
    -0.06
    loyment
    -0.06
     сил
    -0.06
    _words
    -0.06
    det
    -0.06
    POSITIVE LOGITS
    092
    0.08
    488
    0.07
    533
    0.07
    106
    0.07
    [^
    0.07
     Revolution
    0.06
     Sir
    0.06
     weighing
    0.06
    ِر
    0.06
     Artifact
    0.06
    Act Density 0.013%

    No Known Activations