INDEX
    Explanations

    phrases that express actions or intentions related to trying, playing, and using tools or methods

    New Auto-Interp
    Negative Logits
    PostInfinity
    -0.47
     biais
    -0.33
    Rohy
    -0.29
    تاح
    -0.27
    جمعیت
    -0.26
     Ilustra
    -0.26
    -0.26
     Clippers
    -0.26
     chó
    -0.26
     cámara
    -0.25
    POSITIVE LOGITS
     experimenting
    1.81
     experimentation
    1.73
     messing
    1.68
     experiment
    1.67
    Experiment
    1.63
     Experiment
    1.63
     tinkering
    1.60
    experiment
    1.58
     experimented
    1.55
     tinker
    1.47
    Act Density 0.489%

    No Known Activations