INDEX
Explanations
numerical values and their associated formats or statistics in the text
New Auto-Interp
Negative Logits
cauſe
-0.83
purpoſe
-0.78
ſtate
-0.77
pleaſure
-0.72
caufe
-0.72
theless
-0.72
समीक्षक
-0.68
houſe
-0.68
myſelf
-0.68
خارجية
-0.67
POSITIVE LOGITS
making
1.01
being
0.98
giving
0.94
taking
0.87
using
0.86
having
0.81
getting
0.80
utilizing
0.78
bringing
0.78
running
0.75
Activations Density 0.602%