INDEX
Explanations
phrases indicating a choice or possibility
phrases indicating conditionality and potentiality
New Auto-Interp
Negative Logits
ratulations
-0.70
itiz
-0.70
congratulations
-0.63
athing
-0.61
ãĥĥãĥĪ
-0.59
orer
-0.59
phabet
-0.58
eur
-0.58
Gone
-0.56
itary
-0.56
POSITIVE LOGITS
expense
0.99
moment
0.97
glance
0.90
rate
0.86
mom
0.83
behest
0.82
point
0.82
cost
0.82
point
0.81
cost
0.80
Activations Density 0.033%