INDEX
Explanations
words associated with actions and processes, particularly those that imply judgment or consequence
New Auto-Interp
Negative Logits
itra
-0.17
ismus
-0.14
âĶĢ
-0.14
ubber
-0.14
lsa
-0.14
ubic
-0.14
isp
-0.13
endas
-0.13
ems
-0.13
æŃ
-0.13
POSITIVE LOGITS
oft
0.15
ocha
0.15
askell
0.15
assen
0.14
fony
0.14
.nt
0.14
CID
0.14
Salisbury
0.14
Hale
0.14
esco
0.14
Activations Density 0.002%