INDEX
Explanations
mentions of social dynamics and inequalities
conditional statements and comparative expressions indicating change, improvement, or hypothetical scenarios.
New Auto-Interp
Negative Logits
Besonders
-0.30
だけでなく
-0.27
super
-0.24
'../../../../
-0.23
specifically
-0.23
lika
-0.23
だけではなく
-0.23
よね
-0.22
[]*
-0.22
conformidad
-0.22
POSITIVE LOGITS
somewhat
1.50
somewhat
1.49
marginally
1.48
parcialmente
1.39
slightly
1.39
partially
1.38
Somewhat
1.37
Somewhat
1.34
immerhin
1.34
slightly
1.32
Activations Density 1.466%