INDEX
Explanations
ethnic pride, breeds, cloth, survival
New Auto-Interp
Negative Logits
Subject
0.43
ys
0.42
STRICT
0.42
Biology
0.41
intracellular
0.40
Mon
0.40
requesting
0.40
θεί
0.40
Observed
0.39
anis
0.39
POSITIVE LOGITS
”،
0.47
絴
0.47
그런
0.46
станда
0.45
вра
0.45
포
0.44
서는
0.44
행사
0.44
පිළිබඳ
0.44
人士
0.43
Activations Density 0.002%