INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    abhuto
    0.48
     സ്ഥാപ
    0.47
     ඔවුන්
    0.45
     노력
    0.45
     employeeService
    0.44
    图书馆
    0.44
    তাহাকে
    0.43
    <unused7>
    0.42
    owała
    0.42
     offrir
    0.42
    POSITIVE LOGITS
     phenomena
    0.84
     phenomenon
    0.80
     affects
    0.79
     effects
    0.78
     caused
    0.78
     affect
    0.76
    影響
    0.76
     detrimental
    0.75
     interference
    0.73
     distortion
    0.73
    Act Density 0.278%

    No Known Activations