INDEX
    Explanations

    phrases indicating contrast or continuation

    phrases that indicate agreement or similarity in sentiments

    New Auto-Interp
    Negative Logits
    ]),
    -0.81
    arthed
    -0.76
    ".[
    -0.70
    ]).
    -0.70
    ."[
    -0.64
     ])
    -0.63
    ).[
    -0.62
    è¦ļéĨĴ
    -0.62
     respectively
    -0.61
     âĨij
    -0.60
    POSITIVE LOGITS
    elia
    0.79
     uh
    0.73
    unny
    0.71
    uin
    0.70
     kidding
    0.70
    romeda
    0.69
    obin
    0.68
     Courier
    0.67
     funn
    0.67
     yeah
    0.66
    Act Density 0.310%

    No Known Activations