INDEX
    Explanations

    phrases indicating hierarchy or ranking

    New Auto-Interp
    Negative Logits
     ligiloj
    -0.47
    krieg
    -0.40
    PropertyChanging
    -0.40
    Ответить
    -0.40
     Akismet
    -0.39
     experiments
    -0.38
     erfaring
    -0.38
    wpdb
    -0.37
    GNUC
    -0.37
     tjen
    -0.36
    POSITIVE LOGITS
     top
    0.95
    Top
    0.93
    top
    0.92
     tops
    0.87
     Top
    0.85
     Tops
    0.83
    Topping
    0.79
     TOP
    0.78
    tops
    0.75
    TOP
    0.75
    Act Density 0.011%

    No Known Activations