INDEX
Explanations
conjunctions followed by content that introduces a contrasting or opposite idea
instances of the word "But" indicating contrast or exception
New Auto-Interp
Negative Logits
heads
-0.69
segment
-0.68
.","
-0.61
fell
-0.60
sym
-0.59
¯¯¯¯
-0.58
ceremony
-0.57
award
-0.56
ization
-0.56
paths
-0.55
POSITIVE LOGITS
tons
1.33
romeda
0.93
alas
0.90
theless
0.88
withstanding
0.85
thodox
0.83
chers
0.83
ts
0.82
tif
0.79
anamo
0.78
Activations Density 0.067%