INDEX
Explanations
comparisons and expressions of opinions using the word "like."
expressions of opinion or judgment
New Auto-Interp
Negative Logits
milo
-0.78
"]=>
-0.78
*/(
-0.73
secondly
-0.69
bara
-0.68
ounty
-0.67
ARS
-0.67
conservancy
-0.67
encers
-0.66
ashore
-0.66
POSITIVE LOGITS
innocuous
0.92
reasonable
0.88
distant
0.86
contradiction
0.84
logical
0.82
quaint
0.81
brainer
0.81
sensible
0.79
oxy
0.78
harmless
0.78
Activations Density 0.144%