INDEX
Explanations
listing respective components
New Auto-Interp
Negative Logits
Another
1.06
another
1.02
や
0.97
และ
0.94
や
0.89
이나
0.88
other
0.86
和
0.85
ಅಥವಾ
0.84
и
0.84
POSITIVE LOGITS
respectively
2.57
respectivamente
2.30
각각
2.29
それぞれ
2.24
respectivement
2.02
alike
1.92
jeweils
1.91
それぞれの
1.90
respective
1.89
各自
1.86
Activations Density 0.807%