INDEX
Explanations
instances of the special characters used for marking the beginning of text segments or code blocks
New Auto-Interp
Negative Logits
-0.98
propOrder
-0.86
raiſ
-0.80
itſelf
-0.80
wikipagina
-0.74
Huguen
-0.73
Houſe
-0.70
DKK
-0.67
nakalista
-0.66
houſe
-0.65
POSITIVE LOGITS
the
1.60
THE
1.45
The
1.44
The
1.36
THE
1.17
rethe
1.15
enthe
1.11
sthe
1.10
ithe
1.02
the
1.00
Activations Density 0.074%