INDEX
Explanations
phrases indicating actions or opinions related to individuals or groups
New Auto-Interp
Negative Logits
ulk
-0.15
ë´ī
-0.14
ignon
-0.14
пон
-0.14
iÅŁim
-0.14
atism
-0.14
ingleton
-0.14
uy
-0.14
æĪIJ人
-0.14
Jako
-0.14
POSITIVE LOGITS
wait
0.20
wait
0.20
guarante
0.19
waited
0.18
.wait
0.18
ready
0.17
waiting
0.17
WAIT
0.17
guarantee
0.17
Wait
0.17
Activations Density 0.008%