INDEX
    Explanations

    phrases indicating agreement or confirmation

    phrases expressing an opinion or assessment about something

    New Auto-Interp
    Negative Logits
    apsed
    -0.76
    isin
    -0.72
    oled
    -0.71
    uve
    -0.70
    isner
    -0.67
    cot
    -0.67
    keyes
    -0.65
     Lann
    -0.65
    jac
    -0.64
    aredevil
    -0.62
    POSITIVE LOGITS
     louder
    0.88
    tracks
    0.81
    Sounds
    0.79
    lessly
    0.79
    \\\\\\\\
    0.79
     omin
    0.78
     suspic
    0.78
     vaguely
    0.77
     sounding
    0.77
    bite
    0.76
    Act Density 0.021%

    No Known Activations