INDEX
    Explanations

    references to code documentation and annotations

    New Auto-Interp
    Negative Logits
     poffe
    -0.45
     pleaſure
    -0.44
     faſt
    -0.43
     pleaf
    -0.42
    <bos>
    -0.41
     raiſ
    -0.40
     fuper
    -0.40
     fometimes
    -0.40
    Fase
    -0.40
     itching
    -0.39
    POSITIVE LOGITS
    {@
    2.66
     {@
    2.39
    >{@
    1.37
     '{@
    1.28
     }{@
    0.97
    }{@
    0.96
    (!__
    0.79
     незавершена
    0.78
    (@
    0.77
     (@
    0.76
    Act Density 0.223%

    No Known Activations