INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ."));
    -1.39
    .";
    
    -1.27
    ".
    
    -1.23
    %");
    -1.21
    .");
    
    -1.16
    "]);
    
    -1.16
     }}$}
    -1.15
    ")));
    
    -1.14
    .",
    
    -1.13
    "])
    
    -1.13
    POSITIVE LOGITS
    ↵↵
    0.81
     The
    0.71
     A
    0.69
     Sugar
    0.61
    DockStyle
    0.61
    0.59
     At
    0.58
     In
    0.57
     However
    0.56
     For
    0.56
    Act Density 0.140%

    No Known Activations