INDEX
    Explanations

    is are have

    New Auto-Interp
    Negative Logits
    woman
    -0.07
    stdafx
    -0.07
    -wide
    -0.07
    nas
    -0.06
    ]);
    -0.06
    menus
    -0.06
    main
    -0.06
    .am
    -0.06
    _checkbox
    -0.06
    Mother
    -0.06
    POSITIVE LOGITS
     obě
    0.06
    issenschaft
    0.06
    *g
    0.06
     overclock
    0.06
    _interp
    0.06
    、お
    0.06
     firing
    0.06
     všechno
    0.06
    ọi
    0.06
    (interp
    0.06
    Act Density 0.003%

    No Known Activations