INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     нена
    -0.50
    حوالہ
    -0.48
    Tembelea
    -0.47
    Abitanti
    -0.47
     صوتيه
    -0.45
     recev
    -0.43
    WriteTagHelper
    -0.41
     sztu
    -0.40
    Set
    -0.40
    تقاوى
    -0.40
    POSITIVE LOGITS
     Gaulle
    0.65
     purpoſe
    0.64
    <bos>
    0.62
     calendriers
    0.61
    באנגלית
    0.61
    WAII
    0.60
    0.60
    OGND
    0.59
    kaido
    0.59
    0.58
    Act Density 0.023%

    No Known Activations