INDEX
    Explanations

    phrases related to causation or consequences

    instances of actions or events that lead to consequences

    New Auto-Interp
    Negative Logits
    arily
    -0.72
     toured
    -0.71
     contrace
    -0.69
     cared
    -0.66
     headed
    -0.65
     topped
    -0.64
     handled
    -0.64
     relied
    -0.63
    BN
    -0.63
     owed
    -0.63
    POSITIVE LOGITS
     confirmation
    0.79
     confusion
    0.75
     bloodshed
    0.73
     extinction
    0.71
     dismissal
    0.70
     breakthrough
    0.69
     laughter
    0.69
     death
    0.69
    icial
    0.68
    forth
    0.67
    Act Density 0.051%

    No Known Activations