INDEX
Explanations
emergence and manifestation of phenomena
New Auto-Interp
Negative Logits
'
0.49
-
0.46
\
0.41
נים
0.36
ו
0.36
ové
0.34
/
0.33
ד
0.33
Jenis
0.33
Precio
0.33
POSITIVE LOGITS
ina
0.36
in
0.35
così
0.35
なか
0.35
IT
0.34
undeniably
0.34
compre
0.33
майже
0.33
calamity
0.33
awfully
0.33
Activations Density 0.087%