INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     so
    -1.53
     in
    -1.50
     to
    -1.50
    ,
    -1.47
     as
    -1.46
     that
    -1.45
     we
    -1.44
     was
    -1.43
     can
    -1.43
     for
    -1.43
    POSITIVE LOGITS
    <bos>
    10.93
     ftu
    3.06
     fta
    2.93
     fatis
    2.88
     sappi
    2.82
     dispen
    2.81
     ftre
    2.80
     fup
    2.79
     squa
    2.75
     paff
    2.73
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.