INDEX
    Explanations

    phrases indicating conditional or causal relationships

    Punctuation followed by specific words

    New Auto-Interp
    Negative Logits
     comigo
    -0.69
     vantagem
    -0.66
    <bos>
    -0.65
     stedet
    -0.64
     prochaines
    -0.63
     legais
    -0.61
     conmigo
    -0.60
     meus
    -0.60
    DeleteBehavior
    -0.59
    CodeAttribute
    -0.59
    POSITIVE LOGITS
    )";
    
    0.88
    )");
    
    0.86
     some
    0.76
     large
    0.75
    )"),
    0.75
    '],
    
    0.74
    '),
    
    0.73
     certain
    0.70
    .")
    
    0.70
     ';
    
    0.69
    Act Density 0.510%

    No Known Activations