INDEX
Explanations
phrases related to emotional impact and negative consequences of actions
New Auto-Interp
Negative Logits
-Mart
-0.16
åŁ
-0.15
zion
-0.15
оÑĢаз
-0.14
íĭ±
-0.14
ibo
-0.14
.lt
-0.14
indsight
-0.14
zew
-0.14
.crt
-0.14
POSITIVE LOGITS
onto
0.40
onto
0.36
into
0.29
into
0.24
INTO
0.21
Ont
0.20
Into
0.19
unto
0.19
upon
0.19
_into
0.19
Activations Density 0.267%