INDEX
Explanations
phrases that indicate policy suggestions and recommendations for improvement
New Auto-Interp
Negative Logits
eka
-0.15
bay
-0.15
stroy
-0.15
istrovstvÃŃ
-0.14
ibal
-0.14
ritz
-0.14
atte
-0.14
à¸Ĺาà¸ĩà¸ģาร
-0.14
ucha
-0.14
coded
-0.14
POSITIVE LOGITS
proposal
0.29
proposals
0.26
suggestions
0.25
ideas
0.23
suggestion
0.23
proposal
0.22
Proposal
0.20
ideas
0.20
Ideas
0.20
propose
0.19
Activations Density 0.272%