INDEX
    Explanations

    comparisons

    New Auto-Interp
    Negative Logits
    angelog
    -0.09
    BBW
    -0.09
    imary
    -0.08
     agrade
    -0.08
    arbeiter
    -0.08
    erus
    -0.08
     fumes
    -0.08
    hew
    -0.08
    gefühl
    -0.08
    Wire
    -0.07
    POSITIVE LOGITS
     comparisons
    0.15
     comparison
    0.14
     Compar
    0.14
     comparar
    0.14
     comparing
    0.13
    比较
    0.13
     Comparing
    0.13
    comparison
    0.13
     Comparison
    0.13
     срав
    0.13
    Act Density 0.017%

    No Known Activations