INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     proib
    -0.08
     Lucifer
    -0.08
     pervers
    -0.08
     digestion
    -0.08
     дон
    -0.07
    ktime
    -0.07
     приш
    -0.07
    atoj
    -0.07
     Law
    -0.07
     अपराध
    -0.07
    POSITIVE LOGITS
     editions
    0.10
     edition
    0.10
    Edition
    0.09
    Instance
    0.08
    ید
    0.08
     éditions
    0.08
    0.08
    edition
    0.08
     экземпля
    0.08
     Edition
    0.08
    Act Density 0.004%

    No Known Activations