INDEX
    Explanations

    phrases related to measurements or comparisons

    references to measurement scales or frameworks

    New Auto-Interp
    Negative Logits
    esson
    -0.79
    uala
    -0.75
    vous
    -0.72
    hiro
    -0.69
    olulu
    -0.67
    unal
    -0.67
    WOR
    -0.64
    nor
    -0.63
    èĥ
    -0.63
    selves
    -0.63
    POSITIVE LOGITS
     scale
    1.07
     scales
    0.90
     Scale
    0.81
    scale
    0.77
    itized
    0.76
     invari
    0.75
     replica
    0.75
    craft
    0.72
    enter
    0.67
     scaled
    0.65
    Act Density 0.009%

    No Known Activations