INDEX
    Explanations

    words related to innovations or advancements in technology and systems

    New Auto-Interp
    Negative Logits
    -1.02
    -0.92
    -0.89
     •
    -0.87
     

    -0.85
     ‌
    -0.84
    . 
    -0.79
     < 
    -0.75
     ​
    -0.74
     →
    -0.74
    POSITIVE LOGITS
     youll
    1.82
     youre
    1.80
     theyre
    1.75
     Thats
    1.67
     didnt
    1.65
     doesnt
    1.65
    Dont
    1.61
     Dont
    1.61
     isnt
    1.61
     wasnt
    1.60
    Act Density 0.195%

    No Known Activations