INDEX
    Explanations

    words associated with endings or conclusions

    closing punctuation marks, indicating the end of sentences or segments

    New Auto-Interp
    Negative Logits
    ategory
    -0.76
    ppo
    -0.73
    IGHTS
    -0.72
    kaya
    -0.67
    kefeller
    -0.66
     absor
    -0.65
    alcohol
    -0.65
    PDATE
    -0.64
     underest
    -0.64
    terness
    -0.62
    POSITIVE LOGITS
    angered
    1.02
    orph
    1.01
    angering
    0.95
    ragon
    0.94
    lich
    0.94
    urance
    0.93
    erer
    0.92
    ering
    0.88
    ocrin
    0.86
    ulum
    0.85
    Act Density 0.031%

    No Known Activations