INDEX
    Explanations

    causal relationships or explanations in the text

    terms related to causation and causal relationships

    New Auto-Interp
    Negative Logits
     Leopard
    -0.82
    ardless
    -0.68
    HCR
    -0.66
     Unicorn
    -0.66
    Gro
    -0.65
    >>>>>>>>
    -0.65
     Flake
    -0.65
     Pip
    -0.63
    ushes
    -0.63
    chip
    -0.62
    POSITIVE LOGITS
    ality
    1.13
     caus
    0.96
    istically
    0.91
    ally
    0.89
    ities
    0.86
    atorial
    0.85
    allo
    0.84
    uristic
    0.79
     inference
    0.77
    ually
    0.76
    Act Density 0.018%

    No Known Activations