INDEX
    Explanations

    hyperbole and exaggeration

    New Auto-Interp
    Negative Logits
     crucially
    0.47
     সাধারণভাবে
    0.46
     যেহেতু
    0.45
    やはり
    0.44
     importantly
    0.44
     też
    0.44
     되겠죠
    0.43
    やっぱり
    0.42
     😊
    0.42
     tricky
    0.42
    POSITIVE LOGITS
     almost
    1.38
    几乎
    1.29
    almost
    1.28
     literally
    1.27
     quase
    1.26
     practically
    1.23
     literalmente
    1.23
    literally
    1.22
    幾乎
    1.21
     почти
    1.19
    Act Density 0.048%

    No Known Activations