INDEX
Explanations
significant nouns and verbs, particularly those related to time or transformations
New Auto-Interp
Negative Logits
istinguish
-0.16
££
-0.16
lland
-0.16
actic
-0.15
ÙĦات
-0.15
_presence
-0.15
grams
-0.15
engo
-0.14
isNew
-0.14
çĴĥ
-0.14
POSITIVE LOGITS
bust
0.15
nat
0.15
unden
0.14
idy
0.14
weekday
0.14
.Toolkit
0.13
.bus
0.13
ansk
0.13
onia
0.13
åħ
0.13
Activations Density 0.001%