INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     모델
    -0.80
     מכל
    -0.79
    -0.73
     космо
    -0.73
     xét
    -0.72
    itaji
    -0.71
     approval
    -0.71
    ódó
    -0.69
     Illuminated
    -0.69
    観察
    -0.69
    POSITIVE LOGITS
     optical
    3.33
     illusion
    3.23
     illusions
    2.75
    optical
    2.72
     Optical
    2.58
    Optical
    2.56
    illusion
    2.39
     Illusion
    2.27
     tricks
    2.14
     опти
    2.05
    Act Density 0.050%

    No Known Activations