INDEX
    Explanations

    words indicating likelihood or possibility

    phrases indicating perceived qualities or characteristics

    New Auto-Interp
    Negative Logits
    estern
    -0.98
    ests
    -0.77
    ilts
    -0.76
    iding
    -0.72
    orthern
    -0.72
    atform
    -0.68
    loads
    -0.67
    atching
    -0.67
    aign
    -0.66
    leasing
    -0.65
    POSITIVE LOGITS
    rils
    0.88
     oddly
    0.81
     awfully
    0.79
     strangely
    0.79
     innocuous
    0.78
     plaus
    0.77
     mysteriously
    0.76
    Pause
    0.76
     poised
    0.73
     like
    0.72
    Act Density 0.059%

    No Known Activations