INDEX
Explanations
comparisons or similarities
repeated phrases that express similarity or comparison
New Auto-Interp
Negative Logits
Limited
-0.76
atis
-0.73
ulet
-0.73
ALE
-0.72
inion
-0.71
rift
-0.70
Ve
-0.68
UU
-0.67
bern
-0.66
duction
-0.66
POSITIVE LOGITS
lihood
1.38
lier
0.93
ours
0.90
minded
0.78
liness
0.71
minded
0.70
fate
0.68
soDeliveryDate
0.68
liest
0.67
counterparts
0.66
Activations Density 0.034%