INDEX
Explanations
text related to regulations or structured documents such as paragraphs
references to specific paragraphs or sections within a text
New Auto-Interp
Negative Logits
eer
-0.79
awaru
-0.71
ocker
-0.69
ereo
-0.68
CV
-0.68
xon
-0.67
ayne
-0.66
ENSE
-0.66
eus
-0.65
asio
-0.65
POSITIVE LOGITS
witz
1.06
acters
0.87
agraph
0.86
paragraph
0.85
paragraph
0.83
views
0.77
paragraphs
0.74
commenting
0.74
"$:/
0.71
subparagraph
0.69
Activations Density 0.018%