INDEX
Negative Logits
erd
0.96
erst
0.94
erde
0.94
andrew
0.92
erg
0.90
erns
0.88
eric
0.86
heni
0.86
oció
0.85
uegos
0.85
POSITIVE LOGITS
treating
0.79
putting
0.77
inhibiting
0.70
shutting
0.70
browse
0.67
reassuring
0.66
placing
0.65
neutralize
0.65
looking
0.65
suppressing
0.64
Activations Density 0.000%