INDEX
    Explanations

    words and phrases indicating actions or directives

    New Auto-Interp
    Negative Logits
    Switcher
    -0.76
    jetbrains
    -0.72
     gethan
    -0.70
     sonne
    -0.68
     Florentine
    -0.68
     fubject
    -0.67
     Shakspeare
    -0.67
    fisher
    -0.66
     NDEBUG
    -0.65
     scattata
    -0.65
    POSITIVE LOGITS
     to
    1.58
     TO
    1.17
     To
    1.08
     be
    0.98
     να
    0.97
    To
    0.96
     make
    0.94
     zu
    0.91
    to
    0.88
    yto
    0.87
    Act Density 2.480%

    No Known Activations