INDEX
Explanations
instances of transitions between different phrases or concepts
phrases that introduce paraphrased statements or explanations
New Auto-Interp
Negative Logits
oly
-0.79
ometer
-0.71
igators
-0.64
roy
-0.64
crocod
-0.62
aciously
-0.61
arov
-0.61
minster
-0.60
ometers
-0.59
atars
-0.59
POSITIVE LOGITS
ACTIONS
0.73
"[
0.71
unless
0.70
FTWARE
0.68
Interested
0.68
barring
0.68
excluding
0.66
yes
0.65
اÙĦ
0.64
yeah
0.64
Activations Density 0.145%