INDEX
Explanations
phrases indicating comparison or suggestion
phrases that include the word "say" followed by a numerical value
New Auto-Interp
Negative Logits
SourceFile
-0.81
lete
-0.79
Write
-0.77
Äĩ
-0.74
quet
-0.74
omsky
-0.70
RM
-0.68
Bind
-0.67
vez
-0.66
irs
-0.66
POSITIVE LOGITS
dozen
0.68
Leilan
0.67
Aph
0.64
uh
0.59
fractions
0.58
controlling
0.57
Rug
0.55
ties
0.55
Deg
0.55
tapping
0.55
Activations Density 0.066%