INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (
    0.58
    ทำงาน
    0.56
     Small
    0.56
     berühm
    0.54
     With
    0.53
     Idee
    0.52
     Pizza
    0.52
     גדול
    0.52
     velké
    0.52
     +
    0.52
    POSITIVE LOGITS
     linguistic
    0.78
     της
    0.73
     psychological
    0.73
     τησ
    0.71
     cultural
    0.70
     theological
    0.69
     των
    0.69
     tecnológica
    0.67
     epidemiological
    0.66
     aesthetic
    0.63
    Act Density 0.085%

    No Known Activations