INDEX
Explanations
phrases that express obligations or recommendations
New Auto-Interp
Negative Logits
iske
-0.17
Verde
-0.16
elage
-0.16
phia
-0.15
oras
-0.15
uge
-0.14
atron
-0.14
Spot
-0.14
grund
-0.14
uras
-0.13
POSITIVE LOGITS
igham
0.17
Gim
0.15
Barrett
0.14
ashamed
0.14
BÃł
0.13
bai
0.13
½æķ°
0.13
ercise
0.13
ãĥ¼ãĥ³
0.13
-title
0.13
Activations Density 0.243%