INDEX
    Explanations

    words indicating significant actions or transformations

    New Auto-Interp
    Negative Logits
    eed
    -0.15
    wart
    -0.15
     Hof
    -0.14
    ience
    -0.14
    ournals
    -0.14
     Marks
    -0.13
     Stein
    -0.13
    ivé
    -0.13
     Chapters
    -0.13
     Dice
    -0.13
    POSITIVE LOGITS
    ACHI
    0.18
    .Handled
    0.15
    æ¯Ľ
    0.15
    finger
    0.14
    ãĥ³ãĥģ
    0.14
     convers
    0.14
    éĸ
    0.14
    breadcrumb
    0.14
    ãĥ¼ãĥĨ
    0.14
     ÎķÏĢι
    0.14
    Act Density 0.023%

    No Known Activations