INDEX
Explanations
instances of the word "finish" in various forms and contexts
New Auto-Interp
Negative Logits
oria
-0.15
age
-0.15
atte
-0.14
SI
-0.14
tak
-0.14
inda
-0.13
itness
-0.13
onte
-0.13
sdale
-0.13
/from
-0.13
POSITIVE LOGITS
touches
0.23
off
0.22
er
0.21
egan
0.20
eren
0.19
/start
0.18
-off
0.18
erd
0.18
ings
0.17
TOUCH
0.17
Activations Density 0.029%