INDEX
Explanations
sections of academic papers, specifically focusing on introductions and conclusions
Introduction, conclusion, discussion
New Auto-Interp
Negative Logits
queſta
-0.77
styleType
-0.75
ſſung
-0.75
<unused68>
-0.73
<unused17>
-0.73
[@BOS@]
-0.73
<unused14>
-0.73
<unused16>
-0.73
<unused8>
-0.73
<unused3>
-0.73
POSITIVE LOGITS
Introduction
0.57
Introduction
0.55
INTRODUCTION
0.44
introduction
0.42
introduction
0.40
introducción
0.39
Introdu
0.37
INTRODUCTION
0.37
Introdu
0.35
rosca
0.34
Activations Density 0.002%