INDEX
Explanations
numerical values and symbols in mathematical expressions
New Auto-Interp
Negative Logits
↵↵
-0.79
face
-0.56
ro
-0.56
flight
-0.55
="";
-0.54
Mid
-0.54
—
-0.54
over
-0.53
#%%
-0.53
T
-0.52
POSITIVE LOGITS
myſelf
1.07
uſed
0.99
itſelf
0.99
himſelf
0.97
marseille
0.93
doubtnut
0.91
deſt
0.91
Monfieur
0.91
auffi
0.90
themſelves
0.89
Activations Density 0.322%