INDEX
Negative Logits
Agree
0.42
Key
0.42
Successfully
0.40
Think
0.40
Key
0.39
️
0.39
感到
0.38
başarı
0.37
所
0.37
insistence
0.37
POSITIVE LOGITS
curtail
0.63
eradicate
0.61
regulate
0.57
discourage
0.55
promote
0.54
curb
0.54
alleviate
0.54
undermine
0.53
revitalize
0.52
amelior
0.52
Activations Density 0.015%