INDEX
Explanations
expressions indicating belief or uncertainty
New Auto-Interp
Negative Logits
RTLU
-0.63
therin
-0.59
dhury
-0.56
Hyman
-0.54
ocracy
-0.53
̀
-0.53
autorytatywna
-0.52
Hig
-0.51
spodnie
-0.51
MessageTagHelper
-0.51
POSITIVE LOGITS
يتيمه
0.70
oneself
0.65
person
0.60
člověk
0.60
είς
0.59
ешься
0.57
consultato
0.55
consulté
0.55
hopes
0.54
uação
0.54
Activations Density 0.119%