INDEX
    Explanations

    phrases that communicate simplification or generalization

    New Auto-Interp
    Negative Logits
     kerap
    -0.59
     Uran
    -0.59
    ";
    
    -0.55
     Marac
    -0.55
     particolarmente
    -0.54
     Repair
    -0.53
     repaired
    -0.53
    Discre
    -0.53
     Observed
    -0.53
     amigurumi
    -0.53
    POSITIVE LOGITS
     basically
    1.75
    Basically
    1.74
    basically
    1.68
     Basically
    1.66
    Essentially
    1.51
    essentially
    1.48
     essentially
    1.47
     Essentially
    1.43
     básicamente
    1.19
    基本上
    0.86
    Act Density 0.135%

    No Known Activations