INDEX
Explanations
well-being and health state
New Auto-Interp
Negative Logits
dehuman
0.61
Pourquoi
0.61
某些
0.57
meghatá
0.57
只能
0.56
使い方
0.56
Movie
0.56
ineffective
0.54
ifferentiate
0.54
ၽ
0.54
POSITIVE LOGITS
顺利
1.11
стаби
1.09
estabilidad
1.07
healthy
1.03
saludable
1.03
良好
1.02
سلامت
1.01
स्वस्थ
0.99
health
0.98
stability
0.97
Activations Density 0.009%