INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    astr
    -0.07
    -0.07
    Storyboard
    -0.07
    otence
    -0.06
     mostra
    -0.06
     contra
    -0.06
     آینده
    -0.06
    -0.06
    cstring
    -0.06
     carta
    -0.06
    POSITIVE LOGITS
     level
    0.15
     Level
    0.15
     levels
    0.13
    Level
    0.12
     LEVEL
    0.12
    level
    0.12
    -level
    0.11
     Levels
    0.11
    Levels
    0.11
    -Level
    0.11
    Act Density 0.055%

    No Known Activations