INDEX
Explanations
phrases indicating a distinct comparison or exception
phrases indicating a contrast or exception
New Auto-Interp
Negative Logits
tumble
-0.57
tailed
-0.52
hunted
-0.52
":[{"-0.52
tyr
-0.50
"},"
-0.48
toast
-0.48
reproduce
-0.48
OTT
-0.47
prus
-0.47
POSITIVE LOGITS
heid
1.24
icularly
0.99
ments
0.86
from
0.83
ively
0.83
ensive
0.82
ranging
0.80
ional
0.78
orial
0.77
selves
0.77
Activations Density 0.030%