INDEX
    Explanations

    phrases related to comprehension and understanding

    New Auto-Interp
    Negative Logits
    tagHelper
    -0.70
    -0.64
    arbox
    -0.58
    Rim
    -0.57
    тари
    -0.57
    ot
    -0.57
    الثة
    -0.57
     colpo
    -0.56
    ash
    -0.56
     Grossman
    -0.56
    POSITIVE LOGITS
    understand
    1.69
     Understand
    1.65
    Understand
    1.64
     understand
    1.62
     understands
    1.57
     understanding
    1.52
     understood
    1.42
    understood
    1.41
     understandings
    1.41
    understanding
    1.41
    Act Density 0.091%

    No Known Activations