INDEX
    Explanations

    phrases or questions related to random or hypothetical scenarios

    sentences that contain questions or rhetorical inquiries

    New Auto-Interp
    Negative Logits
     dialect
    -0.74
     embassy
    -0.71
    Forge
    -0.68
     unilaterally
    -0.67
    ilib
    -0.67
     misrepresent
    -0.66
     UNCLASSIFIED
    -0.64
     unilateral
    -0.63
     materially
    -0.61
     bloc
    -0.61
    POSITIVE LOGITS
     Turns
    0.79
    Enter
    0.76
     Mehran
    0.72
    joice
    0.72
    hiro
    0.71
    Luckily
    0.67
     Garmin
    0.67
    STON
    0.66
     Franch
    0.65
    Thankfully
    0.65
    Act Density 0.825%

    No Known Activations