INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ɚ
    -0.64
    AndEndTag
    -0.62
     Amerikaanse
    -0.54
     creș
    -0.53
    在美国
    -0.53
     argint
    -0.52
     Infórmanos
    -0.51
     Daß
    -0.51
     amerikanischen
    -0.51
    ității
    -0.51
    POSITIVE LOGITS
     UK
    1.12
     Britain
    1.05
     £
    1.03
     British
    1.02
     (£
    0.96
    英国
    0.96
     BRITISH
    0.94
    Whilst
    0.93
     Whilst
    0.93
    0.93
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.