INDEX
Explanations
intensifiers and evaluative adjectives
New Auto-Interp
Negative Logits
fusc
-0.16
noc
-0.15
addock
-0.14
lopedia
-0.14
aurus
-0.14
è¾ĵ
-0.14
anmar
-0.14
isodes
-0.14
onta
-0.14
lew
-0.14
POSITIVE LOGITS
ify
0.16
rium
0.15
azard
0.14
tempted
0.14
Same
0.14
ãĥ³ãĥĢ
0.14
ament
0.14
dj
0.14
coincidence
0.13
enough
0.13
Activations Density 0.166%