INDEX
    Explanations

    specific actions like reading, watching, examining, and talking

    actions that involve reading, watching, or reviewing content

    New Auto-Interp
    Negative Logits
    essert
    -0.68
    Els
    -0.67
    equal
    -0.65
     harbour
    -0.65
    amen
    -0.63
    interstitial
    -0.62
    arded
    -0.62
     unfairly
    -0.62
    Enjoy
    -0.59
    IVE
    -0.58
    POSITIVE LOGITS
    hran
    0.68
     foregoing
    0.67
     assurances
    0.67
    ¿½
    0.64
     acquaint
    0.62
    romy
    0.61
    reports
    0.60
    math
    0.59
     KH
    0.59
     KS
    0.59
    Act Density 0.491%

    No Known Activations