INDEX
    Explanations

    references to specific measurements or quantities

    New Auto-Interp
    Negative Logits
    edin
    -0.16
    king
    -0.15
    up
    -0.15
    aad
    -0.14
    iri
    -0.14
     fron
    -0.14
    uen
    -0.14
    ney
    -0.14
    oga
    -0.14
    æŃ
    -0.13
    POSITIVE LOGITS
    tsky
    0.17
    .).↵↵
    0.16
    .
    0.15
    .С
    0.15
    SetBranch
    0.15
     HOLDERS
    0.15
    fsp
    0.15
    elter
    0.15
    .:.
    0.15
    ICLE
    0.15
    Act Density 0.268%

    No Known Activations