INDEX
Explanations
expressions of opinion, belief, or assessment
New Auto-Interp
Negative Logits
çŁ¢
-0.17
hazi
-0.17
efa
-0.15
essen
-0.14
oke
-0.14
emento
-0.14
finances
-0.14
ken
-0.14
mar
-0.14
ef
-0.14
POSITIVE LOGITS
borough
0.15
ocos
0.14
ONGL
0.14
ulp
0.14
stuff
0.14
认为
0.14
Lance
0.14
inz
0.14
Cena
0.14
Ù쨶ÙĦ
0.14
Activations Density 0.246%