INDEX
Explanations
phrases or clauses introducing contrasting or explanatory information
repetitions of the word "that"
New Auto-Interp
Negative Logits
acca
-0.80
oras
-0.69
arine
-0.67
orse
-0.67
kay
-0.66
aro
-0.66
WT
-0.66
ña
-0.66
Forty
-0.65
tnc
-0.65
POSITIVE LOGITS
nonetheless
1.49
nevertheless
1.40
alas
0.98
etheless
0.95
still
0.92
beware
0.86
persisted
0.84
fortunately
0.78
persists
0.78
never
0.77
Activations Density 0.370%