INDEX
Explanations
phrases starting with the word "a"
the indefinite articles "a" and "an"
New Auto-Interp
Negative Logits
weights
-0.72
TDs
-0.70
orest
-0.68
osc
-0.68
antes
-0.68
ores
-0.67
ometers
-0.66
onto
-0.66
favourites
-0.66
itiz
-0.65
POSITIVE LOGITS
nutshell
1.38
twist
0.90
perverse
0.87
contradiction
0.81
brief
0.80
typical
0.79
footnote
0.79
statement
0.79
flurry
0.79
bizarre
0.78
Activations Density 0.070%