INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     contextes
    -0.83
    今は
    -0.83
    など
    -0.82
     prawie
    -0.81
    -0.78
     megfelelő
    -0.77
     कैसी
    -0.76
     quartiers
    -0.75
     たく
    -0.75
    こちらは
    -0.75
    POSITIVE LOGITS
     honestly
    4.06
     tbh
    3.59
     frankly
    3.42
    honestly
    3.03
    Honestly
    3.00
     Honestly
    2.97
     truth
    2.95
     TB
    2.89
     to
    2.80
    Frankly
    2.59
    Act Density 0.076%

    No Known Activations