INDEX
Explanations
phrases indicating significant consequences or commitments
New Auto-Interp
Negative Logits
aminer
-0.16
Relax
-0.15
kart
-0.15
otal
-0.15
帯
-0.14
anc
-0.14
avor
-0.14
emann
-0.14
issen
-0.14
Hollow
-0.14
POSITIVE LOGITS
Ïħνα
0.16
ISR
0.15
uddle
0.15
_wire
0.15
ibi
0.14
eni
0.14
blr
0.14
айд
0.14
odium
0.14
epy
0.14
Activations Density 0.092%