INDEX
    Explanations

    instances of teaching and guidance in complex situations

    New Auto-Interp
    Negative Logits
    endi
    -0.16
    dex
    -0.16
    bic
    -0.15
    iale
    -0.15
    enderit
    -0.14
    Wunused
    -0.14
    kili
    -0.14
    ãģĵ
    -0.14
    ìŬ
    -0.14
    erea
    -0.14
    POSITIVE LOGITS
     explaining
    0.21
     Explain
    0.19
     explain
    0.19
    explain
    0.18
     explanation
    0.16
    znám
    0.16
    aware
    0.15
    説æĺİ
    0.15
     explanations
    0.14
     explained
    0.14
    Act Density 0.363%

    No Known Activations