INDEX
Explanations
observing and questioning others
New Auto-Interp
Negative Logits
B
0.43
government
0.39
bog
0.38
0.38
áll
0.38
wider
0.36
broader
0.36
ourselves
0.36
قرار
0.35
components
0.34
POSITIVE LOGITS
生徒
0.47
clientes
0.45
学生的
0.45
enfants
0.45
supervise
0.44
талант
0.43
protege
0.43
학생
0.43
peasants
0.42
ಪ್ರಕರಣ
0.42
Activations Density 0.087%