INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cheerful
    0.59
     cheery
    0.57
     gloomy
    0.56
     мастер
    0.56
     lunar
    0.55
     focus
    0.55
     đào
    0.53
     infrastructure
    0.52
    Stre
    0.52
     andre
    0.52
    POSITIVE LOGITS
     respectivement
    1.20
    それぞれ
    1.19
     respectively
    1.16
    respectively
    1.09
     entrambi
    1.03
    分别
    0.97
     beiden
    0.96
     दोनों
    0.96
    どちら
    0.95
     각각
    0.95
    Act Density 0.770%

    No Known Activations