INDEX
Explanations
text fragments indicating a need for citation
references to citations or indications of required sourcing in the text
New Auto-Interp
Negative Logits
milo
-0.63
stood
-0.63
standing
-0.61
ipation
-0.60
chal
-0.59
runaway
-0.58
nen
-0.57
uming
-0.57
profits
-0.56
ittle
-0.56
POSITIVE LOGITS
]
1.05
]).
0.98
]),
0.97
])
0.96
]:
0.95
}.
0.92
]"
0.90
].
0.89
)]
0.86
]
0.85
Activations Density 0.035%