INDEX
    Explanations

    descriptions of social status and relationships

    New Auto-Interp
    Negative Logits
     INTERRU
    -0.14
    енÑĤа
    -0.14
     mort
    -0.14
    enstein
    -0.14
     unaware
    -0.14
     ParseException
    -0.13
     instagram
    -0.13
    人æ°Ĺ
    -0.13
     Zwe
    -0.13
     Lesser
    -0.13
    POSITIVE LOGITS
     smoker
    0.18
     gentleman
    0.18
    .seek
    0.17
     singles
    0.17
    Smoke
    0.16
    ingles
    0.16
    ozy
    0.16
     honest
    0.16
     discrete
    0.16
     smoke
    0.15
    Act Density 0.174%

    No Known Activations