INDEX
    Explanations

    phrases indicating actions or events related to conflict or struggle

    New Auto-Interp
    Negative Logits
    ")]
    
    -0.80
    ...");
    
    -0.80
    '},
    
    -0.75
    ";
    
    -0.75
    :");
    
    -0.74
    "):
    
    -0.73
    '])){
    
    -0.73
    )";
    
    -0.72
    ");
    
    -0.72
    .");
    
    -0.71
    POSITIVE LOGITS
    2.76
    TagMode
    0.52
    ↵↵↵
    0.43
    IsContent
    0.40
    "=>"
    0.39
    .*;
    0.38
    </strong>
    0.38
    })$}
    0.37
    </em>
    0.37
    </h2>
    0.36
    Act Density 1.023%

    No Known Activations