INDEX
    Explanations

    questions and statements about understanding and explaining concepts

    New Auto-Interp
    Negative Logits
    asthan
    -0.17
    boro
    -0.15
    igram
    -0.14
    ifax
    -0.14
     notice
    -0.14
     Shuttle
    -0.14
    orning
    -0.13
    лÑĥ
    -0.13
    =-=-
    -0.13
    ặ
    -0.13
    POSITIVE LOGITS
     explanations
    0.61
     explanation
    0.60
     explaining
    0.58
     explain
    0.56
     explained
    0.54
     explains
    0.51
     Explanation
    0.50
     Explain
    0.49
    explained
    0.49
    Explanation
    0.49
    Act Density 0.040%

    No Known Activations