INDEX
    Explanations

    references to numerical values and their corresponding significance in context

    New Auto-Interp
    Negative Logits
    earable
    -0.70
    anan
    -0.66
    76561
    -0.64
    heit
    -0.64
    TPPStreamerBot
    -0.63
    utor
    -0.61
    uggest
    -0.59
    Creat
    -0.58
    ariat
    -0.55
    bryce
    -0.55
    POSITIVE LOGITS
     etc
    0.80
     respectively
    0.79
    +,
    0.72
    +.
    0.71
     istg
    0.67
    ,
    0.65
    -,
    0.65
    ,...
    0.64
     increments
    0.61
     architectures
    0.61
    Act Density 0.027%

    No Known Activations