INDEX
Explanations
words related to promises or commitments
words related to obligations and responsibilities
New Auto-Interp
Negative Logits
Caldwell
-0.71
mir
-0.67
ushi
-0.67
infiltrate
-0.65
locality
-0.63
burner
-0.58
cube
-0.57
SEE
-0.57
bit
-0.56
fortune
-0.56
POSITIVE LOGITS
idth
1.02
owed
0.85
ield
0.84
ielding
0.76
itting
0.74
orld
0.74
NESS
0.74
adoes
0.71
ness
0.70
ows
0.70
Activations Density 0.007%