INDEX
Explanations
describing physical attributes
New Auto-Interp
Negative Logits
ﻒ
0.54
Categ
0.52
Damage
0.51
क्ष्य
0.47
なく
0.47
砾
0.47
layoff
0.46
ूरिया
0.46
ना
0.46
즙
0.46
POSITIVE LOGITS
conquered
0.41
comercio
0.40
conquer
0.39
han
0.39
authoritarian
0.39
supremo
0.39
dirigeants
0.38
encargado
0.38
totalitarian
0.37
متنوع
0.37
Activations Density 0.002%