INDEX
Explanations
phrases indicating expressed thoughts or feelings
New Auto-Interp
Negative Logits
<eos>
-0.57
↵↵↵
-0.56
</blockquote>
-0.55
↵
-0.55
↵↵
-0.53
↵↵↵↵
-0.52
↵↵↵↵↵
-0.50
contra
-0.48
-
-0.47
again
-0.47
POSITIVE LOGITS
DrawerToggle
0.81
]<<"
0.78
PopupWindow
0.70
houſe
0.68
Houſe
0.66
voulait
0.64
AndEndTag
0.64
SAX
0.63
purpoſe
0.63
出版年
0.62
Activations Density 0.166%