INDEX
Explanations
the word 'par' with varying activation strengths, potentially indicating a focus on specific text related to certain topics or entities
references to paragraphs or sections within a document
New Auto-Interp
Negative Logits
ħĭ
-0.94
¥µ
-0.86
doms
-0.75
æ©Ł
-0.68
ĨĴ
-0.68
customary
-0.67
urst
-0.67
OME
-0.65
å¸
-0.65
houses
-0.63
POSITIVE LOGITS
allel
1.20
liament
1.03
anoia
0.93
imony
0.89
vati
0.87
cel
0.80
ity
0.80
ippi
0.79
amount
0.78
ion
0.78
Activations Density 0.006%