INDEX
Explanations
words associated with varying degrees of intensity or pressure
New Auto-Interp
Negative Logits
BITS
-0.15
stitute
-0.15
liers
-0.14
immune
-0.14
ustum
-0.14
uin
-0.14
Cro
-0.14
.trip
-0.14
plers
-0.13
plib
-0.13
POSITIVE LOGITS
minimum
0.15
оÑĩ
0.14
ful
0.14
Minimum
0.14
eft
0.14
çī§
0.14
عÙħاÙĦ
0.13
fu
0.13
hatt
0.13
grounds
0.13
Activations Density 0.649%