INDEX
    Explanations

    phrases related to explanations or reasoning

    phrases that clarify reasons or justifications

    New Auto-Interp
    Negative Logits
    emies
    -0.82
    ille
    -0.78
    heit
    -0.73
    ontent
    -0.72
    jab
    -0.72
    nets
    -0.72
    ctors
    -0.69
    ngth
    -0.68
    ionics
    -0.68
    estial
    -0.67
    POSITIVE LOGITS
     why
    1.70
    why
    1.31
     WHY
    1.21
     discrepancies
    0.98
     how
    0.96
     Why
    0.91
     inconsistencies
    0.89
    Why
    0.85
     reluctance
    0.84
     variance
    0.82
    Act Density 0.114%

    No Known Activations