INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
    jade
    -0.07
     kot
    -0.07
    inq
    -0.07
     amendment
    -0.07
     cabeza
    -0.07
    wahl
    -0.07
     topical
    -0.07
    manuel
    -0.07
    slash
    -0.07
    POSITIVE LOGITS
    STRUCTION
    0.22
    structions
    0.22
    STRUCTIONS
    0.22
    struction
    0.21
    stru
    0.15
    стру
    0.14
    struct
    0.14
    струк
    0.13
    structors
    0.13
     adeil
    0.13
    Act Density 0.005%

    No Known Activations