INDEX
Explanations
phrases emphasizing the purpose or intended outcome of actions
New Auto-Interp
Negative Logits
consci
-0.47
Nowak
-0.47
itſelf
-0.47
victime
-0.46
catena
-0.44
trama
-0.43
bottes
-0.42
roue
-0.42
straff
-0.42
Clegg
-0.41
POSITIVE LOGITS
for
1.34
For
1.21
for
1.20
FOR
1.14
For
1.14
สำหรับ
1.08
für
1.05
voor
1.05
untuk
1.04
для
1.03
Activations Density 0.790%