INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ;width
    -0.08
    -0.07
    ender
    -0.07
     gravity
    -0.07
     opat
    -0.06
     Indoor
    -0.06
     مول
    -0.06
    forget
    -0.06
    ffb
    -0.06
    ологія
    -0.06
    POSITIVE LOGITS
    мін
    0.07
     професси
    0.07
     Sect
    0.06
    0.06
     연구
    0.06
     pig
    0.06
     Polish
    0.06
    Indices
    0.06
     Size
    0.06
    иться
    0.06
    Act Density 0.002%

    No Known Activations