INDEX
    Explanations

    phrases related to abstract concepts

    phrases that indicate comprehension or knowledge

    New Auto-Interp
    Negative Logits
    onies
    -0.83
    nar
    -0.70
    ads
    -0.68
    sites
    -0.63
    drops
    -0.61
    adding
    -0.61
    Textures
    -0.61
     piling
    -0.60
    yss
    -0.60
    quer
    -0.60
    POSITIVE LOGITS
    ually
    0.94
     Understanding
    0.86
     understanding
    0.85
     comprehension
    0.80
    ably
    0.80
     Understand
    0.78
     how
    0.74
     HOW
    0.74
    FUL
    0.74
    displayText
    0.72
    Act Density 0.011%

    No Known Activations