INDEX
Explanations
sister, brother, influenced
New Auto-Interp
Negative Logits
ם
0.46
ase
0.45
కు
0.43
పేరు
0.43
你需要
0.43
ס
0.42
不起
0.41
,\
0.40
iyal
0.40
kia
0.40
POSITIVE LOGITS
scaff
0.48
primates
0.45
Monochrome
0.44
turismo
0.43
anthropology
0.43
raisal
0.43
Westen
0.43
monochrome
0.42
grayscale
0.42
Condition
0.41
Activations Density 0.001%