INDEX
Explanations
phrases related to time duration and expectations
New Auto-Interp
Negative Logits
ÄĽÅĻ
-0.19
lear
-0.15
istes
-0.14
ilm
-0.14
onis
-0.14
/////
-0.13
ÙĨÛĮÙĨ
-0.13
iland
-0.13
jez
-0.13
Cotton
-0.13
POSITIVE LOGITS
took
0.51
Took
0.45
took
0.41
takes
0.39
Takes
0.35
take
0.35
taking
0.33
takes
0.33
cost
0.31
Take
0.28
Activations Density 0.143%