INDEX
Explanations
words that indicate agreement, concession, or logical consequence, such as "allows", "agree", "therefore", "however", and "accordingly".
sentence starters and transitional phrases that indicate progression or sequence in text
New Auto-Interp
Negative Logits
<eos>
-0.38
ff
-0.32
jeta
-0.31
ffar
-0.31
<bos>
-0.31
חיצוניים
-0.31
pri
-0.30
pe
-0.30
λοι
-0.30
te
-0.30
POSITIVE LOGITS
Theſe
0.52
becauſe
0.49
myſelf
0.46
AddTagHelper
0.45
виправивши
0.44
فريبيس
0.43
Мексичка
0.43
expandindo
0.40
Jefus
0.40
ſeveral
0.40
Activations Density 0.789%