INDEX
    Explanations

    surprising or amazed reactions in text

    phrases that express surprise or unexpectedness

    New Auto-Interp
    Negative Logits
    burgh
    -0.80
    illes
    -0.67
     bye
    -0.66
     throats
    -0.66
    aim
    -0.62
    ère
    -0.61
    alach
    -0.60
    ulton
    -0.59
    Recommended
    -0.59
     fid
    -0.59
    POSITIVE LOGITS
     similarities
    0.81
     how
    0.70
     similarity
    0.69
     discrepancies
    0.68
    fusc
    0.67
    how
    0.65
     Conserv
    0.64
     unanim
    0.64
     parallels
    0.64
     surprise
    0.64
    Act Density 0.239%

    No Known Activations