INDEX
    Explanations

    how something is affected

    New Auto-Interp
    Negative Logits
    something
    0.83
    Doctors
    0.83
     cosas
    0.82
    Something
    0.80
    incredible
    0.79
    Berikut
    0.77
     nifty
    0.77
     постепен
    0.77
     mooie
    0.77
     എന്ത
    0.77
    POSITIVE LOGITS
     newly
    1.09
     heavily
    0.92
     participating
    0.89
     recently
    0.86
     affected
    0.86
     previously
    0.86
     poorly
    0.83
     sampled
    0.81
     freshly
    0.80
     intensively
    0.80
    Act Density 0.788%

    No Known Activations