INDEX
Explanations
phrases indicating comparisons and contrasts
New Auto-Interp
Negative Logits
ÙĪØ±Ø´
-0.14
Jackie
-0.14
pard
-0.14
lish
-0.14
å¤
-0.14
231
-0.13
oit
-0.13
иÑĪ
-0.13
ë¹Ľ
-0.13
tro
-0.13
POSITIVE LOGITS
gesture
0.21
attempt
0.19
way
0.19
measure
0.18
contribution
0.18
part
0.17
gift
0.17
Gesture
0.16
response
0.16
Contribution
0.15
Activations Density 0.144%