INDEX
Explanations
phrases denoting contrast or exception
conditional phrases and clauses that introduce qualifications or exceptions
New Auto-Interp
Negative Logits
enced
-0.65
Heller
-0.59
Yuan
-0.57
encyclopedia
-0.56
Kaplan
-0.55
Turk
-0.54
Hat
-0.53
atten
-0.52
Lam
-0.51
Sund
-0.51
POSITIVE LOGITS
guiActiveUn
0.84
pton
0.79
vc
0.74
soever
0.74
FW
0.71
tan
0.70
EVA
0.70
unders
0.68
TPPStreamerBot
0.67
iton
0.66
Activations Density 0.469%