INDEX
    Explanations

    phrases expressing surprise or realization

    phrases expressing surprise or disbelief

    New Auto-Interp
    Negative Logits
    Dialogue
    -0.81
    adr
    -0.72
    ribut
    -0.70
    Rel
    -0.69
    gp
    -0.64
    ioxide
    -0.63
    utic
    -0.62
    verend
    -0.61
    rough
    -0.60
    bis
    -0.60
    POSITIVE LOGITS
     beforehand
    0.75
     bothered
    0.70
     Saban
    0.70
     Bout
    0.64
     existed
    0.63
     myself
    0.63
     Kamp
    0.63
     intending
    0.62
    terday
    0.62
     spoiled
    0.61
    Act Density 0.317%

    No Known Activations