INDEX
    Explanations

    words indicative of human experiences or personal narratives

    New Auto-Interp
    Negative Logits
     several
    -0.71
    several
    -0.69
     Several
    -0.67
    many
    -0.65
    Several
    -0.63
    BeginInit
    -0.63
     many
    -0.60
     externi
    -0.59
    Usual
    -0.58
     banyak
    -0.58
    POSITIVE LOGITS
     who
    0.63
     privées
    0.60
     kto
    0.59
     whom
    0.56
     antaranya
    0.56
    RegressionTest
    0.55
     anonyme
    0.54
     pubblici
    0.54
     dévelo
    0.53
     veramente
    0.51
    Act Density 0.254%

    No Known Activations