INDEX
Explanations
sources or attributions in a document
references to various sources or citations in a text
New Auto-Interp
Negative Logits
estern
-0.83
oÄŁ
-0.76
okers
-0.71
oso
-0.69
destro
-0.68
apo
-0.67
hma
-0.67
psey
-0.67
eg
-0.67
cumbers
-0.65
POSITIVE LOGITS
Sources
1.00
Fed
0.94
Source
0.92
source
0.83
Forge
0.81
ource
0.81
Republic
0.77
books
0.76
Cub
0.74
Republic
0.74
Activations Density 0.016%