INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,
    -0.19
    -0.18
     list
    -0.17
     type
    -0.15
     form
    -0.15
    /th
    -0.15
    -s
    -0.15
     particularly
    -0.15
     fire
    -0.15
     set
    -0.15
    POSITIVE LOGITS
     There
    0.29
     Although
    0.28
     While
    0.28
     This
    0.27
     Since
    0.27
     It
    0.26
     However
    0.26
     Though
    0.26
     The
    0.26
     During
    0.26
    Act Density 2.072%

    No Known Activations