INDEX
    Explanations

    words related to reasons or explanations for certain events

    references to causation or reasons for events

    New Auto-Interp
    Negative Logits
     tick
    -0.66
     matter
    -0.65
     adapter
    -0.62
     spread
    -0.60
     index
    -0.60
     shuff
    -0.60
     dream
    -0.60
     sw
    -0.59
     overl
    -0.59
     ho
    -0.59
    POSITIVE LOGITS
    due
    4.59
    because
    1.36
     due
    1.32
    Due
    1.31
     Due
    1.30
    despite
    1.17
    given
    1.16
    since
    1.15
    thanks
    1.14
    cause
    1.06
    Act Density 0.025%

    No Known Activations