INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Declar
    -0.08
     horiz
    -0.07
     horizontal
    -0.07
     stream
    -0.07
     matur
    -0.07
     Höhen
    -0.07
     inclusion
    -0.07
     lina
    -0.07
    Horizontal
    -0.06
     declar
    -0.06
    POSITIVE LOGITS
    וכל
    0.08
     इलेक्ट्र
    0.08
     электр
    0.08
    에게
    0.08
    0.08
     মারা
    0.08
     элект
    0.08
     electricians
    0.08
    казать
    0.08
    angnya
    0.08
    Act Density 0.007%

    No Known Activations