INDEX
    Explanations

    phrases indicating observation or speculation

    phrases indicating appearances or predictions about situations

    New Auto-Interp
    Negative Logits
    ocaust
    -0.90
    velength
    -0.87
    iqueness
    -0.79
    cial
    -0.78
    akable
    -0.76
    ategory
    -0.71
    cart
    -0.71
    utch
    -0.66
    cession
    -0.65
    together
    -0.65
    POSITIVE LOGITS
     Rasmussen
    0.79
    TOR
    0.71
     unlikely
    0.68
     slowing
    0.65
     FSA
    0.65
     Chimera
    0.63
     Melania
    0.61
     whoever
    0.60
     Devin
    0.59
     McCabe
    0.59
    Act Density 0.128%

    No Known Activations