INDEX
Explanations
words related to political discussions and actions
New Auto-Interp
Negative Logits
hof
-0.17
ãĥģ
-0.15
Barber
-0.15
py
-0.15
iek
-0.14
cap
-0.14
dbus
-0.14
etler
-0.13
609
-0.13
/release
-0.13
POSITIVE LOGITS
wner
0.17
wer
0.16
tượng
0.15
erif
0.15
patron
0.15
WARE
0.15
è¯Ŀ
0.15
WER
0.14
TMPro
0.14
arl
0.14
Activations Density 0.018%