INDEX
    Explanations

    specific actions or observations within a sentence, such as noticing, discovering, or finding

    New Auto-Interp
    Negative Logits
    youtube
    -0.68
    ula
    -0.66
    ugal
    -0.65
    hattan
    -0.62
    charge
    -0.62
    duty
    -0.61
    href
    -0.61
    raviolet
    -0.61
    phrine
    -0.61
    osc
    -0.60
    POSITIVE LOGITS
     unmist
    0.90
     similarities
    0.88
     how
    0.83
     nothing
    0.82
     what
    0.81
     none
    0.80
     plenty
    0.79
     myriad
    0.79
     something
    0.79
     startling
    0.78
    Act Density 0.267%

    No Known Activations