INDEX
Explanations
relationships and interactions between different subjects or characters
New Auto-Interp
Negative Logits
èm
-0.15
afil
-0.15
uppe
-0.14
olon
-0.14
ronics
-0.14
ksen
-0.14
oltip
-0.14
ftar
-0.13
prung
-0.13
arias
-0.13
POSITIVE LOGITS
both
0.49
Both
0.46
both
0.46
Both
0.45
ambos
0.43
两人
0.43
_both
0.43
beide
0.42
mutual
0.42
BOTH
0.41
Activations Density 0.555%