INDEX
Explanations
phrases that emphasize the result or outcome of an action or situation
New Auto-Interp
Negative Logits
>=",
-0.75
Hentet
-0.61
بيها
-0.57
Beauchamp
-0.56
Appleton
-0.56
INGRED
-0.53
>{@-0.53
bastien
-0.53
xious
-0.52
awtextra
-0.52
POSITIVE LOGITS
Оно
0.84
оно
0.82
its
0.79
itself
0.70
it
0.63
它
0.62
Its
0.61
它
0.59
Its
0.57
themselves
0.56
Activations Density 0.150%