INDEX
Explanations
activations of the pronouns "He," "We," and "my," indicating a focus on personal or collective experiences
New Auto-Interp
Negative Logits
ķ
-0.15
.Unity
-0.15
''''
-0.13
ë³´ëĤ´ê¸°
-0.13
-REAL
-0.13
lož
-0.13
ÑģÑĤвоÑĢ
-0.13
ODEV
-0.13
abi
-0.13
redd
-0.13
POSITIVE LOGITS
rtype
0.14
unar
0.13
ourt
0.13
@Web
0.13
asic
0.13
pedia
0.12
consec
0.12
CEF
0.12
Pradesh
0.12
intptr
0.12
Activations Density 0.463%