INDEX
    Explanations

    references to clusters and their characteristics

    New Auto-Interp
    Negative Logits
     s
    -0.44
     simplistic
    -0.42
     di
    -0.40
     reveal
    -0.40
     version
    -0.40
     Hu
    -0.40
     Segal
    -0.39
    нент
    -0.38
     series
    -0.38
    hu
    -0.38
    POSITIVE LOGITS
     autorytatywna
    0.70
     faſt
    0.69
     كومونز
    0.69
     ſta
    0.69
    IsContent
    0.68
     chofe
    0.66
    wiſe
    0.66
     propOrder
    0.66
     Италијани
    0.65
     raiſ
    0.65
    Act Density 0.387%

    No Known Activations