INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     screens
    -0.07
     Spect
    -0.06
     seizure
    -0.06
    олот
    -0.06
     weld
    -0.06
    -0.06
    =".$_
    -0.06
    Pizza
    -0.06
     Mad
    -0.06
     Roma
    -0.06
    POSITIVE LOGITS
     imper
    0.07
    .integration
    0.06
    (^)(
    0.06
     embedding
    0.06
    ційний
    0.06
     categorical
    0.06
     onDelete
    0.06
    0.06
     hưởng
    0.06
    ى
    0.06
    Act Density 0.006%

    No Known Activations