INDEX
Negative Logits
навіть
0.40
unità
0.39
новом
0.38
masterpieces
0.37
高效
0.36
هنوز
0.36
effic
0.36
effiz
0.36
효율
0.36
semplici
0.35
POSITIVE LOGITS
कथित
0.50
controversial
0.49
suspected
0.45
controvers
0.43
alleged
0.42
problematic
0.42
allegedly
0.38
controversy
0.38
troubled
0.38
ആരോപ
0.38
Activations Density 0.132%