INDEX
    Explanations

    phrases indicating certainty or strong belief

    conversational phrases expressing opinions or reactions

    New Auto-Interp
    Negative Logits
    İĭ
    -0.76
    ership
    -0.73
     Presence
    -0.70
    é¾įå¥ij士
    -0.70
    Contents
    -0.69
    assembly
    -0.65
    -+-+
    -0.63
    Rated
    -0.63
     Assist
    -0.62
    ãĤ¢ãĥ«
    -0.62
    POSITIVE LOGITS
     forgot
    0.95
     sounds
    0.81
     sounded
    0.80
     kinda
    0.79
     sucks
    0.78
     kidding
    0.78
     spelled
    0.75
     fooled
    0.75
     tricked
    0.71
     got
    0.71
    Act Density 0.223%

    No Known Activations