INDEX
    Explanations

    mentions of the word "Hard" at relatively high activation levels

    New Auto-Interp
    Negative Logits
    uality
    -0.56
    allery
    -0.47
    umbn
    -0.46
    ĸļ
    -0.45
    uations
    -0.44
     Emir
    -0.43
    oration
    -0.41
     Shutterstock
    -0.40
     Mens
    -0.40
    orative
    -0.39
    POSITIVE LOGITS
    ened
    0.64
    ness
    0.56
    ball
    0.53
    core
    0.52
    iness
    0.52
    iest
    0.51
    ening
    0.51
    Reply
    0.50
    ware
    0.49
    castle
    0.48
    Act Density 16.836%

    No Known Activations