INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    stv
    0.52
     Expense
    0.49
     dilwale
    0.48
     dxe
    0.48
    )**
    0.47
     comparability
    0.47
     loneliness
    0.46
     Isolation
    0.46
    IntValue
    0.46
     linearity
    0.46
    POSITIVE LOGITS
    نا
    0.54
    ло
    0.50
    لى
    0.48
    бли
    0.48
    ч
    0.47
    лу
    0.46
    про
    0.46
     maior
    0.45
    نامه
    0.45
    Зна
    0.45
    Act Density 0.005%

    No Known Activations