INDEX
Explanations
simple answers or statements
the term "simple" in various contexts
New Auto-Interp
Negative Logits
largeDownload
-0.80
raints
-0.78
vance
-0.74
hips
-0.73
extensively
-0.71
ingle
-0.68
igham
-0.68
vigorously
-0.68
inburgh
-0.66
hovah
-0.65
POSITIVE LOGITS
tons
1.25
minded
0.97
wallet
0.94
ton
0.85
arithmetic
0.83
json
0.82
ified
0.80
syrup
0.79
minded
0.79
ured
0.77
Activations Density 0.029%