INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    '
    -1.02
    -1.00
     is
    -1.00
     has
    -0.84
     applies
    -0.84
     grows
    -0.83
     plays
    -0.81
     reacts
    -0.80
     operates
    -0.80
     develops
    -0.79
    POSITIVE LOGITS
     were
    1.19
     are
    1.08
     have
    0.99
     aren
    0.86
    were
    0.85
     WERE
    0.84
     weren
    0.82
     appear
    0.79
     seem
    0.78
     happen
    0.77
    Act Density 0.106%

    No Known Activations