INDEX
Explanations
instances of demonstrative pronouns and phrases indicating emphasis or introduction
New Auto-Interp
Negative Logits
ullo
-0.15
Ñī
-0.15
apons
-0.13
nf
-0.13
ulent
-0.13
vr
-0.13
STUD
-0.13
inous
-0.13
ARGS
-0.13
glas
-0.13
POSITIVE LOGITS
wasn
0.20
was
0.18
soon
0.18
_was
0.17
incident
0.16
same
0.16
#__
0.16
åĽº
0.16
marked
0.15
å½ĵçĦ¶
0.15
Activations Density 0.119%