INDEX
Explanations
references to the "Both" instances indicating a focus on shared or similar elements
New Auto-Interp
Negative Logits
variously
-0.60
nejen
-0.58
abetes
-0.57
ocide
-0.55
hassee
-0.55
urez
-0.54
blik
-0.52
hnia
-0.52
rinol
-0.52
Esq
-0.52
POSITIVE LOGITS
Both
1.35
Both
1.18
Beide
0.95
Ambos
0.95
Ambos
0.87
båda
0.79
Neither
0.78
BOTH
0.76
Neither
0.74
peggio
0.72
Activations Density 0.006%