INDEX
    Explanations

    documentation and questions

    New Auto-Interp
    Negative Logits
     wraps
    -0.96
     under
    -0.91
     WRAP
    -0.91
     Wrapper
    -0.90
     Wraps
    -0.88
     Wrap
    -0.85
    Pah
    -0.81
    wraps
    -0.80
    Hilton
    -0.80
     Minden
    -0.79
    POSITIVE LOGITS
     fece
    0.91
    aney
    0.91
     awesome
    0.89
     Vertrauen
    0.85
     prato
    0.85
    .............
    0.84
    HEET
    0.83
     spectacular
    0.82
     amazing
    0.82
    ree
    0.82
    Act Density 0.004%

    No Known Activations