INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     to
    -1.02
     do
    -0.98
     have
    -0.97
     a
    -0.97
     all
    -0.96
     in
    -0.95
     no
    -0.94
     an
    -0.93
     so
    -0.93
     are
    -0.91
    POSITIVE LOGITS
    <bos>
    8.21
     ftu
    2.06
     fta
    2.02
     fatis
    1.94
     dispen
    1.94
     fup
    1.90
     paff
    1.84
     poff
    1.83
    expandindo
    1.83
     ftre
    1.83
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.