INDEX
Explanations
references to significant changes and transformations in context
New Auto-Interp
Negative Logits
ONO
-0.14
itorio
-0.14
udge
-0.14
vit
-0.14
manner
-0.14
Mand
-0.14
MAND
-0.14
/thumb
-0.14
ouve
-0.13
ono
-0.13
POSITIVE LOGITS
(change
0.17
-change
0.17
sworth
0.16
áng
0.16
(changes
0.16
ìĤ¬íķŃ
0.16
ÑĢÑı
0.15
ãģĻãģİ
0.15
change
0.15
/change
0.15
Activations Density 0.406%