INDEX
Explanations
government performance protection
New Auto-Interp
Negative Logits
및
0.43
ACTIVITY
0.41
lø
0.40
جنگ
0.39
مطلب
0.39
لیم
0.38
ACTIVITY
0.38
активности
0.38
suivant
0.37
తె
0.37
POSITIVE LOGITS
began
0.45
crept
0.44
testified
0.40
seemed
0.40
hesitated
0.40
ventured
0.39
doth
0.39
fled
0.39
racting
0.39
wondered
0.39
Activations Density 0.000%