INDEX
    Explanations

    abstract concepts following "of"

    New Auto-Interp
    Negative Logits
    ທ່ານ
    0.54
    ح
    0.54
    У
    0.54
    the
    0.53
     Произ
    0.52
    S
    0.51
     échantillons
    0.51
     Обра
    0.49
    Obviously
    0.49
    they
    0.47
    POSITIVE LOGITS
     sorts
    1.01
     course
    0.81
     interest
    0.74
     colonialism
    0.65
     normalcy
    0.62
     betrayal
    0.62
     disbelief
    0.62
     contention
    0.61
     interplay
    0.59
     wrongdoing
    0.59
    Act Density 0.807%

    No Known Activations