INDEX
Explanations
phrases related to invitations or offers of participation
New Auto-Interp
Negative Logits
going
-0.19
rit
-0.15
heading
-0.15
861
-0.15
Going
-0.14
going
-0.14
ami
-0.14
aze
-0.14
ay
-0.14
rh
-0.14
POSITIVE LOGITS
into
0.42
onto
0.36
into
0.35
Into
0.33
Into
0.29
INTO
0.29
onto
0.27
vÃło
0.26
_into
0.26
.into
0.23
Activations Density 0.061%