INDEX
Explanations
phrases related to statements or claims being made
New Auto-Interp
Negative Logits
hens
-0.76
assis
-0.69
sit
-0.68
icals
-0.65
terson
-0.62
herbs
-0.62
peppers
-0.61
resil
-0.59
Strip
-0.58
inspected
-0.58
POSITIVE LOGITS
sure
0.90
payable
0.80
arations
0.79
BLIC
0.78
dinand
0.73
vious
0.72
Dispatch
0.72
itious
0.71
ailable
0.70
available
0.69
Activations Density 0.044%