INDEX
Negative Logits
acés
0.49
நிக
0.48
us
0.47
any
0.46
dört
0.46
function
0.46
particip
0.46
usik
0.46
замы
0.45
reward
0.44
POSITIVE LOGITS
dislikes
0.48
torn
0.46
श्रॉफ
0.46
disorder
0.45
contradicted
0.44
crumbled
0.43
telegram
0.43
retorted
0.43
torna
0.43
abnormally
0.42
Activations Density 0.022%