INDEX
    Explanations

    references to psychology and related concepts

    New Auto-Interp
    Negative Logits
     Jörg
    -1.01
     CreateTagHelper
    -0.94
     SDC
    -0.88
    ampton
    -0.86
     Paro
    -0.86
    łgorzata
    -0.85
    nościo
    -0.83
     bounties
    -0.83
     Jock
    -0.82
     AMR
    -0.82
    POSITIVE LOGITS
     Kro
    0.89
     ль
    0.86
    Kro
    0.74
    ın
    0.73
     Buck
    0.70
     Fisk
    0.69
     pis
    0.68
    úcar
    0.68
     Crist
    0.68
     Holland
    0.67
    Act Density 0.484%

    No Known Activations