INDEX
Explanations
phrases or terms within square brackets
the closing brackets or brackets in the text
New Auto-Interp
Negative Logits
neighbors
-0.52
Twilight
-0.51
blight
-0.50
ro
-0.49
narrowly
-0.48
sa
-0.47
lo
-0.45
steel
-0.45
lur
-0.45
ever
-0.45
POSITIVE LOGITS
].
3.76
]."
3.17
].
2.99
]).
2.90
],
2.77
];
2.77
],"
2.68
.]
2.65
]:
2.51
!]
2.38
Activations Density 0.008%