INDEX
    Explanations

    phrases that express surprise or unexpectedness

    New Auto-Interp
    Negative Logits
    まで
    -0.49
    pagne
    -0.49
     addirittura
    -0.47
     ||
    
    -0.47
    qrstuvwxyz
    -0.47
    Contributed
    -0.46
    const
    -0.46
     nakalista
    -0.46
    even
    -0.46
     pot
    -0.45
    POSITIVE LOGITS
     للاسماء
    1.09
    verständlich
    0.94
    følgelig
    0.86
    eraard
    0.85
     ovviamente
    0.82
     obviamente
    0.81
     ujednoznacz
    0.81
     natürlich
    0.77
     مشين
    0.72
     Natürlich
    0.70
    Act Density 0.043%

    No Known Activations