INDEX
    Explanations

    words related to offering potential reasons or justifications

    phrases related to providing explanations or justifications

    New Auto-Interp
    Negative Logits
    illet
    -0.80
    estial
    -0.76
    zig
    -0.73
    opers
    -0.72
    sembly
    -0.72
    ibaba
    -0.71
    shr
    -0.70
    oned
    -0.69
    ymph
    -0.69
    raid
    -0.68
    POSITIVE LOGITS
     WHY
    1.07
     why
    1.00
     explanations
    0.91
    why
    0.86
     explanation
    0.85
     thereof
    0.78
     rationale
    0.75
     explan
    0.75
     explaining
    0.72
    ation
    0.71
    Act Density 0.044%

    No Known Activations