INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Disneyland
    -0.08
    idian
    -0.08
     Knot
    -0.08
     benign
    -0.07
     Nud
    -0.07
     mutant
    -0.07
    	DEBUG
    -0.07
     steam
    -0.07
     harmless
    -0.07
     лица
    -0.07
    POSITIVE LOGITS
    (width
    0.11
    -width
    0.11
     largura
    0.11
    -columns
    0.10
     width
    0.10
     Width
    0.10
     Columns
    0.10
     تقس
    0.10
     widths
    0.10
    0.10
    Act Density 0.004%

    No Known Activations