INDEX
Explanations
sections of the text where there is a significant activation, indicating the beginning of a new segment or major topic shift
New Auto-Interp
Negative Logits
ніципалі
-0.60
orap
-0.60
dav
-0.57
іга
-0.57
Chio
-0.56
ening
-0.56
idhi
-0.56
Merid
-0.55
mistak
-0.55
úly
-0.54
POSITIVE LOGITS
↵
1.80
↵↵
1.51
</h4>
1.18
'])){
1.12
')));
1.09
)){
1.08
</h3>
1.08
"]];
1.06
"])){1.06
"]);
1.06
Activations Density 0.110%