INDEX
    Explanations

    terms related to causes and their effects

    New Auto-Interp
    Negative Logits
     caus
    -0.20
     cause
    -0.19
     causal
    -0.19
    Cause
    -0.18
     Cause
    -0.17
     causa
    -0.17
     caused
    -0.17
    ti
    -0.17
    ize
    -0.16
     causing
    -0.16
    POSITIVE LOGITS
    -effect
    0.31
     cél
    0.29
     cele
    0.27
    effect
    0.20
    ways
    0.19
    way
    0.18
     celebr
    0.18
    lesh
    0.18
    lessly
    0.17
    égorie
    0.17
    Act Density 0.024%

    No Known Activations