INDEX
Explanations
experimental treatment and control groups
New Auto-Interp
Negative Logits
बंधित
0.44
家族
0.41
предыду
0.41
व्यक्ति
0.40
individuales
0.38
வ்வாற
0.38
ских
0.38
ملاقات
0.38
ան
0.38
হরণ
0.37
POSITIVE LOGITS
experimental
1.08
Experimental
1.05
Experimental
0.98
experimental
0.96
эксперимента
0.82
intervention
0.81
Treatment
0.80
treatment
0.79
Treatment
0.79
Intervention
0.77
Activations Density 0.019%