INDEX
    Explanations

    code and file formats

    New Auto-Interp
    Negative Logits
    Cancel
    0.68
    AST
    0.64
    িষ
    0.60
     Informed
    0.58
     informed
    0.57
     cancelar
    0.57
     honesty
    0.56
    лкой
    0.55
     shocked
    0.54
    Cancelar
    0.54
    POSITIVE LOGITS
     Pozn
    0.62
    wn
    0.56
     نگر
    0.55
    gios
    0.54
     ময়
    0.53
    діть
    0.53
    ماية
    0.52
    コク
    0.52
     panneaux
    0.52
    0.52
    Act Density 0.001%

    No Known Activations