INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     expertly
    -0.12
     meticulously
    -0.12
     Carefully
    -0.12
     cleverly
    -0.12
     lovingly
    -0.12
     carefully
    -0.11
     wonderfully
    -0.11
     Further
    -0.11
     beautifully
    -0.11
     neatly
    -0.10
    POSITIVE LOGITS
     wrote
    0.22
     think
    0.17
     specify
    0.16
     define
    0.16
     analyze
    0.15
     написал
    0.15
     formulate
    0.15
     stating
    0.14
     state
    0.14
     identify
    0.14
    Act Density 0.032%

    No Known Activations