INDEX
Explanations
references to pledges, promises, and commitments
New Auto-Interp
Negative Logits
orz
-0.16
بط
-0.16
vidia
-0.15
quo
-0.15
geile
-0.15
اسطة
-0.15
unkt
-0.14
Tcp
-0.14
सल
-0.13
oucher
-0.13
POSITIVE LOGITS
promise
0.92
promises
0.84
Promise
0.77
commitments
0.71
commitment
0.70
promise
0.70
pledge
0.69
promised
0.69
Promise
0.67
pledges
0.64
Activations Density 0.394%