INDEX
Explanations
really and genuinely positive
New Auto-Interp
Negative Logits
unimaginable
0.63
extravagant
0.56
infamous
0.55
shocking
0.54
horrifying
0.53
bizarre
0.51
drastic
0.51
extreme
0.50
unprecedented
0.50
якобы
0.48
POSITIVE LOGITS
really
0.89
naprawdę
0.77
действительно
0.75
vraiment
0.75
wirklich
0.74
really
0.73
realmente
0.71
Really
0.71
gerçekten
0.71
genuinely
0.71
Activations Density 0.004%