INDEX
    Explanations

    punctuation and formatting cues in text

    New Auto-Interp
    Negative Logits
    __':
    
    -0.73
    __':
    -0.72
    __":
    -0.69
    __":
    
    -0.67
    <eos>
    -0.66
     unknownFields
    -0.60
    >");
    
    -0.53
    ">//
    -0.52
    }');
    -0.50
    ↵↵
    -0.50
    POSITIVE LOGITS
     بيها
    0.76
    曖昧さ回避
    0.75
     pleaſure
    0.67
    ########.
    0.67
    odly
    0.65
    AnchorStyles
    0.64
    cèse
    0.64
     يتيمه
    0.63
    0.62
    ibouti
    0.61
    Act Density 0.716%

    No Known Activations