INDEX
Explanations
phrases that contrast or add additional information to the preceding text
references to conditional statements or clauses
New Auto-Interp
Negative Logits
Rac
-0.79
PLUS
-0.76
chwitz
-0.67
ãĤ´
-0.66
lette
-0.65
ãĢĤ
-0.61
Yok
-0.61
onds
-0.61
Indra
-0.61
Chevron
-0.60
POSITIVE LOGITS
technically
1.13
admittedly
0.84
differ
0.84
differed
0.83
initially
0.82
somewhat
0.79
theoretically
0.79
undeniably
0.78
disagree
0.77
differs
0.76
Activations Density 0.399%