INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     👌
    1.00
    0.95
     👍
    0.93
     😂
    0.88
     😁
    0.88
     😊
    0.86
     ########
    0.86
     (#
    0.86
    0.85
     😘
    0.85
    POSITIVE LOGITS
    Wor
    0.64
    Turn
    0.63
    if
    0.62
    Passenger
    0.59
    भी
    0.59
    Temperature
    0.59
    Second
    0.59
    또한
    0.59
    Moment
    0.59
    หาก
    0.58
    Act Density 0.104%

    No Known Activations