INDEX
    Explanations

    leaving or ending phrases

    New Auto-Interp
    Negative Logits
     genocide
    0.46
     neben
    0.45
     lotus
    0.45
     Chalmers
    0.44
     yoga
    0.44
     humains
    0.43
     bookkeeping
    0.43
     поряд
    0.43
     персо
    0.42
     Lotus
    0.42
    POSITIVE LOGITS
    NGC
    0.48
    Jeśli
    0.41
    Treas
    0.39
    Prop
    0.38
    Why
    0.37
    why
    0.36
    ુપ
    0.36
    ReLU
    0.36
    What
    0.35
    Would
    0.35
    Act Density 0.000%

    No Known Activations