INDEX
    Explanations

    phrases indicating simplicity or ease of understanding

    New Auto-Interp
    Negative Logits
    hips
    -0.78
    eters
    -0.78
    reon
    -0.75
    grave
    -0.74
    raints
    -0.72
    mbuds
    -0.68
    orp
    -0.66
    orf
    -0.64
    arians
    -0.63
    emp
    -0.62
    POSITIVE LOGITS
    Jet
    0.93
    going
    0.91
     prey
    0.80
    wallet
    0.80
    coded
    0.78
     easy
    0.72
    accessible
    0.70
     forgiving
    0.70
     minded
    0.70
    ily
    0.69
    Act Density 0.024%

    No Known Activations