INDEX
Explanations
phrases related to promises or commitments to end certain practices or conditions
New Auto-Interp
Negative Logits
LETTE
-0.15
Pais
-0.14
iris
-0.14
alth
-0.14
vette
-0.14
antd
-0.14
locker
-0.14
udget
-0.13
ses
-0.13
065
-0.13
POSITIVE LOGITS
ervas
0.16
ear
0.16
oj
0.16
หย
0.15
illard
0.15
ely
0.14
pto
0.14
icon
0.14
putas
0.14
ikh
0.13
Activations Density 0.023%