INDEX
    Explanations

    phrases related to causality or explanation

    causal phrases or expressions indicating reasons for something occurring

    New Auto-Interp
    Negative Logits
    =-=-=-=-=-=-=-=-
    -0.80
    arest
    -0.79
    asp
    -0.73
    arro
    -0.71
     Sheep
    -0.71
    chip
    -0.69
    adel
    -0.68
     Leone
    -0.68
    hov
    -0.68
    ivas
    -0.65
    POSITIVE LOGITS
     diligence
    1.16
    giving
    0.93
    itiz
    0.75
     cancell
    0.75
     dilig
    0.71
    gers
    0.70
    */(
    0.69
    llers
    0.69
    wcs
    0.65
    )=(
    0.65
    Act Density 0.021%

    No Known Activations