INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     realization
    -1.55
     realisation
    -1.51
     realising
    -1.09
     realizing
    -1.08
     realize
    -0.91
     realise
    -0.90
     realised
    -0.89
     realized
    -0.89
     realiz
    -0.89
     realizes
    -0.87
    POSITIVE LOGITS
     herself
    0.61
     himself
    0.61
     themselves
    0.60
     of
    0.56
    herself
    0.52
    himself
    0.51
    themselves
    0.51
     ourselves
    0.51
     properly
    0.48
     yourself
    0.47
    Act Density 0.043%

    No Known Activations