INDEX
    Explanations

    phrases indicating negation or opposition

    New Auto-Interp
    Negative Logits
     ditangkap
    -0.59
     Visited
    -0.58
     licked
    -0.58
    converted
    -0.57
     sitter
    -0.56
    branded
    -0.56
    Pautan
    -0.56
    ALLOWED
    -0.56
    
    -0.56
    StructEnd
    -0.56
    POSITIVE LOGITS
     EconPapers
    0.69
     being
    0.65
    ;"></
    0.61
     مرئيه
    0.56
    izing
    0.56
    kmäler
    0.55
     getting
    0.55
     making
    0.55
    })();
    
    0.54
    encodeWith
    0.53
    Act Density 0.416%

    No Known Activations