INDEX
    Explanations

    statements that refer to a narrative or a sequence of events

    New Auto-Interp
    Negative Logits
    ignt
    -0.77
    orem
    -0.76
    emale
    -0.73
    aez
    -0.71
    inence
    -0.69
    ynski
    -0.68
    anyon
    -0.65
    ardless
    -0.62
    ategory
    -0.61
     numeric
    -0.61
    POSITIVE LOGITS
    telling
    1.33
    te
    1.13
    boards
    1.01
    book
    1.00
    tell
    0.98
     arc
    0.97
    board
    0.94
     arcs
    0.90
    books
    0.89
    boarding
    0.86
    Act Density 0.086%

    No Known Activations