INDEX
Explanations
the introduction of significant concepts or topics within the text
New Auto-Interp
Negative Logits
EconPapers
-1.29
bezeichneter
-1.21
Efq
-1.17
myſelf
-1.16
itſelf
-1.15
pleaſure
-1.14
raiſ
-1.13
purpoſe
-1.12
abestanden
-1.11
―――――
-1.06
POSITIVE LOGITS
0.70
↵↵
0.69
↵
0.62
"
0.60
c
0.57
1
0.57
↵↵↵
0.56
...
0.56
</em>
0.55
"
0.55
Activations Density 0.008%