INDEX
Explanations
phrases that indicate additional information or emphasize various points in a discussion
Text following transition words
introducing further information
New Auto-Interp
Negative Logits
først
-0.60
まずは
-0.60
nonetheless
-0.56
primarily
-0.56
nevertheless
-0.56
primarily
-0.55
eerst
-0.55
ابتدا
-0.54
icitis
-0.54
首先
-0.53
POSITIVE LOGITS
Personendaten
0.38
added
0.38
postsleuth
0.36
ditambah
0.35
informée
0.35
engraçadas
0.34
Legături
0.33
fordi
0.33
arXiv
0.33
ężczy
0.33
Activations Density 0.475%