INDEX
    Explanations

    conjunctions and words indicating connections or relationships

    New Auto-Interp
    Negative Logits
    19
    -0.06
    29
    -0.06
    55
    -0.06
    vy
    -0.06
    48
    -0.06
     hakk
    -0.06
     Hans
    -0.06
     Hakk
    -0.06
    54
    -0.06
    53
    -0.06
    POSITIVE LOGITS
     full
    0.07
    full
    0.07
    eh
    0.07
    isté
    0.06
    rost
    0.06
    ew
    0.06
    ãĥ³ãĥĸ
    0.06
    erra
    0.06
    veral
    0.06
    orian
    0.06
    Act Density 0.018%

    No Known Activations