INDEX
Explanations
phrases related to offering help or assistance
phrases indicating support or assistance
New Auto-Interp
Negative Logits
fault
-0.65
mart
-0.64
Fed
-0.64
Champ
-0.63
agues
-0.62
WARN
-0.60
mAh
-0.60
oos
-0.59
uld
-0.58
ames
-0.58
POSITIVE LOGITS
regards
1.20
regard
1.07
standing
0.98
stood
0.94
impunity
0.88
draw
0.87
dignity
0.82
respect
0.78
drawn
0.78
holding
0.70
Activations Density 0.112%