INDEX
    Explanations

    phrases related to conflict and confrontation

    repeated special characters or symbols, particularly the "Ŀ"

    New Auto-Interp
    Negative Logits
     obser
    -0.75
     incorpor
    -0.71
     disadvant
    -0.69
     ende
    -0.68
     incent
    -0.67
     Palestin
    -0.66
     contrace
    -0.65
     mathemat
    -0.64
     unwanted
    -0.63
     sacrific
    -0.62
    POSITIVE LOGITS
    ï¸ı
    0.95
    ¯
    0.95
    ï¸
    0.81
    ÃĽ
    0.77
    âĢł
    0.76
    ttp
    0.74
    °
    0.74
    âĻ
    0.73
    cue
    0.72
    tra
    0.72
    Act Density 0.184%

    No Known Activations