INDEX
    Explanations

    phrases related to legality or illegality

    negations or words indicating the absence of something

    New Auto-Interp
    Negative Logits
     Jihad
    -0.67
     Nanto
    -0.64
     Jem
    -0.63
     Gaul
    -0.63
     Crus
    -0.63
     looms
    -0.63
     clitor
    -0.63
     Skydragon
    -0.62
     mathemat
    -0.62
     Jinn
    -0.61
    POSITIVE LOGITS
    agree
    0.95
    ï¸ı
    0.92
    ï¸
    0.90
    ever
    0.89
    sure
    0.88
    ude
    0.87
    yet
    0.87
    emb
    0.83
    else
    0.83
    İ
    0.83
    Act Density 0.169%

    No Known Activations