INDEX
    Explanations

    phrases expressing causality or change

    statements indicating change or realization

    New Auto-Interp
    Negative Logits
    ahime
    -0.94
    edia
    -0.81
    prus
    -0.68
    gart
    -0.65
    ounter
    -0.64
     [|
    -0.64
    rique
    -0.64
    uca
    -0.62
    racuse
    -0.62
    rican
    -0.61
    POSITIVE LOGITS
     except
    1.03
    together
    0.99
     together
    0.89
     toget
    0.83
     facets
    0.71
     alike
    0.71
     besides
    0.70
     hoop
    0.68
    Ùĩ
    0.67
     nodd
    0.66
    Act Density 0.279%

    No Known Activations