INDEX
Explanations
phrases related to common sense
references to "common sense."
New Auto-Interp
Negative Logits
Stars
-0.77
chrom
-0.72
ISS
-0.71
atern
-0.70
raph
-0.69
quart
-0.68
ench
-0.68
soon
-0.67
âĸ¬âĸ¬
-0.64
seys
-0.64
POSITIVE LOGITS
smanship
0.93
dictates
0.77
chops
0.70
decency
0.69
advice
0.67
ensical
0.67
scissors
0.66
manship
0.66
Reason
0.65
prag
0.65
Activations Density 0.063%