INDEX
    Explanations

    verbatim text

    New Auto-Interp
    Negative Logits
     describes
    -0.06
     nokt
    -0.06
     channel
    -0.06
    anou
    -0.06
    _("
    -0.06
     grains
    -0.06
    .bz
    -0.06
    []"
    -0.06
     widths
    -0.06
    Bs
    -0.06
    POSITIVE LOGITS
     excellence
    0.07
    /Instruction
    0.06
    degrees
    0.06
     GN
    0.06
    ello
    0.06
     brace
    0.06
    ('');↵
    0.06
     опер
    0.06
     frowned
    0.06
     craftsmanship
    0.06
    Act Density 0.013%

    No Known Activations