INDEX
    Explanations

    sections of text that provide informative content

    New Auto-Interp
    Negative Logits
    imal
    -0.17
    ager
    -0.16
    (s
    -0.15
    ol
    -0.14
     already
    -0.14
     policy
    -0.14
     Superior
    -0.14
    æŃ
    -0.13
     point
    -0.13
    ino
    -0.13
    POSITIVE LOGITS
    лини
    0.16
    .microsoft
    0.16
    theid
    0.16
    formace
    0.15
    istrovstvÃŃ
    0.14
    йом
    0.14
    .fromFunction
    0.14
    _tokenize
    0.14
    fono
    0.14
    ån
    0.14
    Act Density 0.091%

    No Known Activations