INDEX
Explanations
evidence of societal roles and expectations in narratives
New Auto-Interp
Negative Logits
пÑĢидеÑĤÑģÑı
-0.16
tility
-0.15
CRET
-0.14
otal
-0.14
avra
-0.14
presso
-0.14
likelihood
-0.14
olla
-0.14
á»ĵi
-0.14
_EXISTS
-0.13
POSITIVE LOGITS
supposed
1.05
suppose
0.79
meant
0.60
supposedly
0.60
purported
0.54
alleged
0.54
allegedly
0.49
intended
0.43
Suppose
0.42
SUP
0.40
Activations Density 0.389%