INDEX
    Explanations

    positive adjectives emphasizing significance, value, and difficulty

    New Auto-Interp
    Negative Logits
    pushFollow
    -0.54
    ArrowToggle
    -0.51
    doPost
    -0.50
    addPreferredGap
    -0.49
    .
    -0.44
    WithFormat
    -0.43
    Билгалдахарш
    -0.43
    EndInit
    -0.42
    extAlignment
    -0.42
     cors
    -0.40
    POSITIVE LOGITS
     autorytatywna
    0.46
    fromnode
    0.43
    تقاوى
    0.43
     anzi
    0.42
     argint
    0.41
    ientras
    0.41
    angliski
    0.40
     Вікіпе
    0.39
     figliu
    0.39
     soldati
    0.38
    Act Density 0.015%

    No Known Activations