INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <TEntity
    -0.07
     завжди
    -0.07
     TripAdvisor
    -0.06
    ılım
    -0.06
    Avatar
    -0.06
    igroup
    -0.06
     wła
    -0.06
    -0.06
     واحدة
    -0.06
    christ
    -0.06
    POSITIVE LOGITS
     minors
    0.07
     transforming
    0.06
    (piece
    0.06
     emiss
    0.06
     carpet
    0.06
    .Escape
    0.06
     meaningful
    0.06
     grin
    0.06
     damage
    0.06
     Delay
    0.06
    Act Density 0.024%

    No Known Activations