INDEX
Explanations
phrases indicating contrasting or additional information
phrases that indicate contrast or exceptions to a prevailing idea
New Auto-Interp
Negative Logits
SourceFile
-0.70
ULAR
-0.61
CLOSE
-0.60
mates
-0.60
ories
-0.60
tnc
-0.58
Same
-0.56
anium
-0.55
Yourself
-0.55
bound
-0.55
POSITIVE LOGITS
anecd
0.75
alas
0.74
concedes
0.71
âķIJâķIJ
0.70
rhet
0.67
aptic
0.67
unlike
0.67
concede
0.66
ignores
0.66
according
0.65
Activations Density 0.096%