INDEX
    Explanations

    expressions of surprise, skepticism, or emotional reactions

    New Auto-Interp
    Negative Logits
    ngth
    -0.76
    elta
    -0.74
    ictionary
    -0.70
     nomine
    -0.69
    semble
    -0.67
    rive
    -0.66
     rou
    -0.66
    erville
    -0.65
    reau
    -0.65
    ioxide
    -0.65
    POSITIVE LOGITS
    imaru
    0.93
     considering
    0.93
     seeing
    0.86
     why
    0.72
     how
    0.69
     Stras
    0.66
    why
    0.65
     because
    0.63
    SPONSORED
    0.62
    because
    0.62
    Act Density 0.290%

    No Known Activations