INDEX
    Explanations

    phrases that express surprise or disbelief

    expressions of disbelief or surprise

    New Auto-Interp
    Negative Logits
     guiActiveUnfocused
    -0.80
    20439
    -0.71
     Reprodu
    -0.70
     Springer
    -0.69
    enthal
    -0.69
     mosqu
    -0.69
    ously
    -0.68
    creen
    -0.64
    eer
    -0.64
     association
    -0.63
    POSITIVE LOGITS
    ª
    1.09
    ł
    1.07
    IJ
    0.95
    Ĵ
    0.94
    ij
    0.92
    ı
    0.91
    once
    0.90
    «
    0.89
    ¹
    0.87
    »
    0.87
    Act Density 0.186%

    No Known Activations