INDEX
    Explanations

    phrases indicating proof and demonstration in academic contexts

    New Auto-Interp
    Negative Logits
    ouden
    -0.13
    etik
    -0.13
    ivor
    -0.13
    agnost
    -0.13
     Recap
    -0.13
     elucid
    -0.13
    agar
    -0.13
     summarizes
    -0.13
    137
    -0.12
    aby
    -0.12
    POSITIVE LOGITS
     shown
    0.80
     show
    0.76
     showed
    0.69
    show
    0.68
    shown
    0.67
    -show
    0.64
     Show
    0.62
    .show
    0.61
     SHOW
    0.61
     prove
    0.60
    Act Density 0.261%

    No Known Activations