INDEX
    Explanations

    questions starting with "Why"

    New Auto-Interp
    Negative Logits
    ibaba
    -0.69
    semble
    -0.67
     Roller
    -0.66
    izen
    -0.66
    ymph
    -0.64
    lator
    -0.63
     consolation
    -0.62
    jri
    -0.59
    culosis
    -0.58
    mun
    -0.58
    POSITIVE LOGITS
     why
    1.28
    why
    1.11
     WHY
    1.09
    abl
    0.93
     bother
    0.88
    Why
    0.87
    Origin
    0.80
     Why
    0.78
    forth
    0.72
     motives
    0.69
    Act Density 3.896%

    No Known Activations