INDEX
Explanations
actions related to change, movement, or transformation
New Auto-Interp
Negative Logits
Heard
-0.17
pon
-0.15
кÑĤа
-0.15
erv
-0.14
ÙĬÙĩ
-0.14
son
-0.13
extreme
-0.13
everything
-0.13
ect
-0.13
ree
-0.13
POSITIVE LOGITS
instead
0.17
Instead
0.17
Instead
0.16
coni
0.15
DEM
0.15
instead
0.14
icone
0.14
ä¼ij
0.14
omor
0.14
ente
0.14
Activations Density 0.033%