INDEX
Negative Logits
be
1.34
være
1.05
бъдат
0.92
happen
0.87
thing
0.86
be
0.85
быть
0.85
помочь
0.83
helfen
0.83
zostać
0.83
POSITIVE LOGITS
refuses
1.07
advantages
1.03
imposes
0.95
avantages
0.94
রেখেছে
0.94
benefits
0.93
embodies
0.92
lacks
0.89
strengthens
0.89
prohibits
0.89
Activations Density 0.026%