INDEX
Explanations
phrases indicating amounts, comparisons, or significant numerical data
New Auto-Interp
Negative Logits
.blob
-0.15
же
-0.15
anten
-0.14
wij
-0.14
çĴ°
-0.13
ptime
-0.13
Immutable
-0.13
ughter
-0.13
Sant
-0.13
eç
-0.13
POSITIVE LOGITS
bust
0.16
inski
0.15
rens
0.14
yx
0.14
ahren
0.14
isher
0.14
iani
0.14
isser
0.14
eneg
0.14
Closet
0.14
Activations Density 1.428%