INDEX
Explanations
mentions of "bottom line" or related phrases indicating conclusions or summaries
New Auto-Interp
Negative Logits
urous
-0.18
.LogWarning
-0.16
erken
-0.15
иÑģ
-0.15
uess
-0.15
ema
-0.14
patches
-0.14
ooth
-0.14
olate
-0.14
.LogError
-0.14
POSITIVE LOGITS
/top
0.21
bottom
0.19
Bottom
0.19
most
0.19
(bottom
0.19
ycastle
0.18
.Bottom
0.18
-bottom
0.18
BOTTOM
0.17
BOTTOM
0.17
Activations Density 0.019%