INDEX
Explanations
descriptions or mentions of things that are simple
repeated references to the concept of simplicity
New Auto-Interp
Negative Logits
hovah
-0.77
vance
-0.74
hips
-0.74
largeDownload
-0.73
extensively
-0.73
raints
-0.68
vigorously
-0.67
umar
-0.67
reon
-0.66
ibal
-0.66
POSITIVE LOGITS
tons
1.41
minded
1.01
arithmetic
0.99
syrup
0.97
ton
0.96
minded
0.96
wallet
0.89
pleasures
0.87
sugars
0.84
explanation
0.80
Activations Density 0.049%