INDEX
    Explanations

    phrases implying strong personal opinions or reflections

    New Auto-Interp
    Negative Logits
     disadvant
    -0.91
     fortun
    -0.71
     princ
    -0.71
     obser
    -0.69
     psychiat
    -0.69
    undown
    -0.68
     vulner
    -0.68
     Seym
    -0.67
     fodder
    -0.66
     Palestin
    -0.65
    POSITIVE LOGITS
    ï¸ı
    1.21
    âĻ
    0.83
    own
    0.81
    女
    0.81
    âĹ
    0.80
    ï¸
    0.78
    Ì
    0.77
    âĶĢâĶĢ
    0.76
    âĶĢâĶĢâĶĢâĶĢ
    0.76
    âĸł
    0.75
    Act Density 0.230%

    No Known Activations