INDEX
Explanations
words related to physical actions or processes
New Auto-Interp
Negative Logits
ritz
-0.15
grown
-0.15
обов
-0.15
onda
-0.15
iders
-0.14
大人
-0.14
orta
-0.14
long
-0.13
á»ĵ
-0.13
paid
-0.13
POSITIVE LOGITS
lessly
0.25
aneously
0.22
ishly
0.20
ily
0.19
ize
0.19
astically
0.18
istically
0.18
uously
0.18
ify
0.18
itize
0.17
Activations Density 0.191%