INDEX
    Explanations

    words related to humans being organized into groups, whether it be ethnic groups, political organizations, medical patients, etc

    New Auto-Interp
    Negative Logits
     ProtoMessage
    -0.84
    INSEE
    -0.83
     Theſe
    -0.81
    IVEREF
    -0.79
    Välislingid
    -0.76
     lenker
    -0.75
     HasFactory
    -0.75
     utafitiHapana
    -0.73
     pinulongan
    -0.73
     שוליים
    -0.72
    POSITIVE LOGITS
    <eos>
    0.64
     that
    0.54
     feared
    0.50
     you
    0.47
     "
    0.46
     becoming
    0.45
    名は
    0.45
     werden
    0.44
    0.44
    ↵↵
    0.43
    Act Density 3.850%

    No Known Activations