INDEX
    Explanations

    sentences describing people's actions or states

    phrases indicating the presence and actions of people

    New Auto-Interp
    Negative Logits
    xxxx
    -0.76
     predecessor
    -0.69
    achment
    -0.68
    TX
    -0.65
     srfAttach
    -0.63
    imental
    -0.63
    ONSORED
    -0.63
    verse
    -0.63
     saga
    -0.62
     fiasco
    -0.61
    POSITIVE LOGITS
     clam
    1.09
     understandably
    0.95
    ateurs
    0.95
     alike
    0.93
     accustomed
    0.91
     beware
    0.90
     routinely
    0.90
     eager
    0.87
     everywhere
    0.84
     encouraged
    0.83
    Act Density 0.388%

    No Known Activations