INDEX
    Explanations

    personal descriptors or characteristics

    references to personal experiences or opinions

    New Auto-Interp
    Negative Logits
    xual
    -1.13
    LER
    -0.79
    ï¸
    -0.77
    GAN
    -0.72
    GGGG
    -0.72
    ktop
    -0.72
    etting
    -0.70
    XM
    -0.70
     Faster
    -0.69
     Tens
    -0.69
    POSITIVE LOGITS
    ised
    1.18
    ized
    1.03
    ities
    0.99
    ization
    0.95
     belongings
    0.94
    isations
    0.93
    isation
    0.90
     pronouns
    0.89
     hygiene
    0.87
     trainer
    0.86
    Act Density 0.016%

    No Known Activations