INDEX
    Explanations

    questions starting with the word "Why"

    questions beginning with "Why."

    New Auto-Interp
    Negative Logits
    ento
    -0.76
    Winged
    -0.68
    tips
    -0.66
    ãĤ´ãĥ³
    -0.65
    apixel
    -0.64
    çͰ
    -0.63
    ãģ®ç
    -0.62
    assador
    -0.61
    ÃįÃį
    -0.60
    oof
    -0.59
    POSITIVE LOGITS
    ?
    0.92
     Exactly
    0.89
     else
    0.84
     matters
    0.81
     Matters
    0.79
     exactly
    0.78
     this
    0.77
     Wrong
    0.75
     Else
    0.74
     such
    0.70
    Act Density 0.110%

    No Known Activations