INDEX
Explanations
sets of instructions with a specific action followed by a reason or consequence
conditional phrases that imply consequences or results
New Auto-Interp
Negative Logits
Mens
-0.69
inch
-0.69
understatement
-0.69
glances
-0.66
gallery
-0.64
praise
-0.62
question
-0.60
Nay
-0.59
ropolitan
-0.58
disbelief
-0.58
POSITIVE LOGITS
oner
0.98
bered
0.98
arer
0.95
apy
0.90
letes
0.90
othe
0.86
oths
0.84
aps
0.82
ooo
0.82
fter
0.81
Activations Density 0.075%