INDEX
Explanations
phrases or sentences emphasizing qualities, conditions, or comparisons
negating phrases that emphasize limits or exceptions
New Auto-Interp
Negative Logits
ibaba
-0.89
etsk
-0.67
separat
-0.63
Pengu
-0.63
ovsky
-0.60
ashtra
-0.59
abi
-0.57
havoc
-0.57
ario
-0.57
Parenthood
-0.56
POSITIVE LOGITS
means
0.81
virtue
0.79
leaps
0.74
uu
0.72
Means
0.71
umbers
0.69
dB
0.67
margins
0.67
proxy
0.65
products
0.65
Activations Density 0.228%