INDEX
    Explanations

    words after decline, down, or illegal

    New Auto-Interp
    Negative Logits
     carer
    0.43
     banknotes
    0.42
     bourgeoisie
    0.40
     impotence
    0.38
     diarrhoea
    0.37
     difficulties
    0.37
     например
    0.36
     strikingly
    0.36
     debtors
    0.35
     morphism
    0.35
    POSITIVE LOGITS
     kiddos
    0.62
    はもちろん
    0.57
     हमारी
    0.54
    جميع
    0.54
     всех
    0.53
    當然
    0.53
     våra
    0.53
     tentunya
    0.52
     تمامی
    0.52
     всіх
    0.52
    Act Density 0.090%

    No Known Activations