INDEX
    Explanations

    short phrases related to current events or news headlines

    sentence endings or punctuation marks in discussions of serious topics

    New Auto-Interp
    Negative Logits
     predec
    -0.78
    iste
    -0.71
     unprotected
    -0.69
     manif
    -0.68
     explan
    -0.67
     defe
    -0.66
     allocation
    -0.65
     mosqu
    -0.65
     shaving
    -0.63
     multiplication
    -0.62
    POSITIVE LOGITS
     Their
    0.90
     They
    0.89
     Besides
    0.85
     Its
    0.82
     *)
    0.82
     Such
    0.82
     These
    0.81
     Fortunately
    0.81
     Specifically
    0.81
     Additionally
    0.81
    Act Density 0.487%

    No Known Activations