INDEX
    Explanations

    questions ending with a question mark

    questions about understanding implications and dynamics

    New Auto-Interp
    Negative Logits
     toxin
    -0.69
     bul
    -0.66
    ality
    -0.66
    iannopoulos
    -0.65
    oki
    -0.65
    uther
    -0.65
     evening
    -0.65
     contr
    -0.64
    onte
    -0.63
     background
    -0.63
    POSITIVE LOGITS
     Well
    1.56
     Firstly
    1.35
     Probably
    1.31
     Quite
    1.29
     Answer
    1.23
     Certainly
    1.23
     Apparently
    1.22
     Possibly
    1.20
     Turns
    1.20
     Obviously
    1.20
    Act Density 0.130%

    No Known Activations