INDEX
Explanations
numerical values related to time
the presence of the token "0"
New Auto-Interp
Negative Logits
symp
-0.72
ener
-0.71
challeng
-0.71
pse
-0.68
forth
-0.68
indo
-0.66
milo
-0.66
ei
-0.63
coerc
-0.62
stru
-0.62
POSITIVE LOGITS
SHARES
1.44
Shares
1.03
Expand
0.78
Explicit
0.77
Ratings
0.69
Comments
0.68
Investig
0.68
Clean
0.67
Arrest
0.67
Detected
0.64
Activations Density 0.019%