INDEX
Explanations
phrases indicating a tendency or inclination toward a particular action or quality
New Auto-Interp
Negative Logits
teil
-0.16
imizer
-0.16
WebResponse
-0.15
achie
-0.14
ttp
-0.14
igrations
-0.14
ëĭ¹
-0.14
ЧеÑĢ
-0.14
еÑĢп
-0.14
ienda
-0.14
POSITIVE LOGITS
toward
0.21
erness
0.17
towards
0.17
664
0.15
Burnett
0.15
olk
0.14
tend
0.14
ye
0.14
tends
0.14
speaking
0.14
Activations Density 0.021%