INDEX
    Explanations

    positive or impactful actions or characteristics described in text

    actions or events that have a significant impact or influence on circumstances

    New Auto-Interp
    Negative Logits
    arta
    -0.80
    >]
    -0.72
    ilings
    -0.69
    picture
    -0.66
    guide
    -0.63
    secondary
    -0.61
    photo
    -0.61
    }}
    -0.61
    device
    -0.60
    oscope
    -0.60
    POSITIVE LOGITS
     raining
    0.80
     downhill
    0.69
     doub
    0.68
     uphill
    0.68
    escap
    0.65
     overload
    0.63
     triv
    0.63
    atche
    0.63
     impossible
    0.62
    ãĥķãĤ©
    0.62
    Act Density 0.925%

    No Known Activations