INDEX
    Explanations

    pronouns and words indicating specific relationships and references in context

    New Auto-Interp
    Negative Logits
    wan
    -0.16
    ãĤīãģı
    -0.15
    ÅĤem
    -0.15
    IFA
    -0.14
    лиÑĨ
    -0.13
    andan
    -0.13
    lava
    -0.13
    INVAL
    -0.13
    енÑĤÑĥ
    -0.13
    onta
    -0.13
    POSITIVE LOGITS
     understanding
    0.47
     understand
    0.43
     Understanding
    0.41
     understands
    0.37
    Understanding
    0.37
     Understand
    0.36
     understood
    0.33
     comprehension
    0.32
    çIJĨè§£
    0.31
     know
    0.29
    Act Density 0.048%

    No Known Activations