INDEX
Explanations
Substitute, Needs, strategic
New Auto-Interp
Negative Logits
en
0.64
race
0.61
anvil
0.52
смя
0.49
hled
0.48
qal
0.48
motif
0.48
ppe
0.47
ell
0.46
prize
0.46
POSITIVE LOGITS
وں
0.52
연
0.45
영화
0.45
ہی
0.45
は
0.44
은
0.44
ون
0.43
او
0.43
는
0.43
是很
0.43
Activations Density 0.000%