INDEX
    Explanations

    phrases indicating surprising or unexpected revelations

    phrases that indicate a consequential or revealing outcome

    New Auto-Interp
    Negative Logits
    ropolitan
    -0.74
    atana
    -0.72
     Colleg
    -0.69
    è¦ļéĨĴ
    -0.68
    ordan
    -0.66
    riot
    -0.65
    riots
    -0.65
    ities
    -0.64
    lain
    -0.63
    icipated
    -0.61
    POSITIVE LOGITS
    ĸ
    0.76
    Ī
    0.74
    WT
    0.71
    ij
    0.71
    terday
    0.70
    \\\\\\\\\\\\\\\\
    0.69
    ¸
    0.67
    coat
    0.66
    Ĺ
    0.66
     beet
    0.66
    Act Density 0.023%

    No Known Activations