INDEX
    Explanations

    terms related to understanding and comprehension

    New Auto-Interp
    Negative Logits
    -0.49
    favourite
    -0.47
    /*
    -0.46
    nomin
    -0.46
     favourite
    -0.45
     surla
    -0.45
     feroit
    -0.44
    Favourite
    -0.43
    mobileqq
    -0.43
    favourites
    -0.42
    POSITIVE LOGITS
     understanding
    2.06
     Understanding
    1.93
     understand
    1.91
    Understanding
    1.84
    understanding
    1.82
     Understand
    1.73
     understands
    1.70
    understand
    1.66
    Understand
    1.66
     understood
    1.63
    Act Density 0.100%

    No Known Activations