INDEX
Explanations
phrases related to giving instructions or directives
phrases that instruct or prompt actions
New Auto-Interp
Negative Logits
defe
-0.61
ean
-0.61
lik
-0.58
prophes
-0.58
NESS
-0.57
iege
-0.55
Lich
-0.54
ikawa
-0.54
ridge
-0.53
ode
-0.52
POSITIVE LOGITS
rid
1.14
TING
1.11
away
0.96
aways
0.89
acquainted
0.79
tin
0.78
cloneembedreportprint
0.78
ãĥ³ãĤ¸
0.78
Started
0.74
Away
0.72
Activations Density 0.081%