INDEX
Explanations
phrases indicating some sort of citation or quote
punctuation marks and their associated significance in the text
New Auto-Interp
Negative Logits
https
-0.66
Travis
-0.63
Thomas
-0.61
http
-0.59
Hamilton
-0.59
Couch
-0.58
pic
-0.57
HT
-0.57
Bengals
-0.56
Vit
-0.56
POSITIVE LOGITS
".[
3.94
."[
3.86
).[
2.66
.[
2.34
"[
2.28
,[
2.12
:[
2.04
)[
1.82
[/
1.79
!".
1.58
Activations Density 0.010%