INDEX
    Explanations

    phrases expressing contrast or contradiction

    New Auto-Interp
    Negative Logits
    '));
    
    -1.10
    ]").
    -1.04
    "){
    
    -1.02
     "));
    -1.02
    "],
    
    -0.99
    ")));
    -0.99
    "));
    
    -0.99
    ")));
    
    -0.98
    "];
    
    -0.98
    ".
    
    -0.97
    POSITIVE LOGITS
    Linq
    0.57
    !
    0.56
     guys
    0.53
    NoSuchAlgorithm
    0.53
     demek
    0.52
     lads
    0.49
     thanks
    0.48
     vicepresidente
    0.47
    もん
    0.47
     fellas
    0.47
    Act Density 0.184%

    No Known Activations