INDEX
Explanations
academic citations and bibliographies
New Auto-Interp
Negative Logits
all
-1.13
どれも
-1.06
どれ
-1.06
全て
-1.04
みんなで
-1.01
すべて
-0.98
semuanya
-0.98
三种
-0.95
tất
-0.94
כול
-0.90
POSITIVE LOGITS
both
3.11
Both
2.58
BOTH
2.52
both
2.41
Both
2.33
beide
2.33
beiden
2.31
entrambi
2.27
ambos
2.23
BOTH
2.23
Activations Density 0.009%