INDEX
Explanations
written explanations or descriptions within a text
references to documents or transcripts
New Auto-Interp
Negative Logits
aneers
-0.92
alties
-0.86
ãĥĺ
-0.85
¨
-0.82
entimes
-0.81
untu
-0.81
Downloadha
-0.79
cles
-0.77
abilities
-0.76
gas
-0.76
POSITIVE LOGITS
rundown
1.09
breakdown
1.07
synopsis
0.99
kicker
0.98
excerpt
0.98
caveat
0.91
overview
0.90
reasoning
0.89
link
0.88
backstory
0.86
Activations Density 0.079%