INDEX
    Explanations

    phrases indicating consequence or logical reasoning

    conjunctions that imply causality

    New Auto-Interp
    Negative Logits
     Rumble
    -0.69
     Defenders
    -0.69
     Feld
    -0.66
    Fram
    -0.65
     MM
    -0.64
     Tacoma
    -0.63
    MM
    -0.62
     nurs
    -0.60
     Twist
    -0.60
     straw
    -0.59
    POSITIVE LOGITS
    forth
    1.09
     facto
    0.79
    ĵĺ
    0.75
    ettings
    0.75
    manuel
    0.74
    uration
    0.73
    otent
    0.72
    ptions
    0.70
    akings
    0.70
    xual
    0.70
    Act Density 0.018%

    No Known Activations