INDEX
Explanations
deterministic, predictable, focused output
New Auto-Interp
Negative Logits
novu
0.45
૩
0.44
входят
0.44
饷
0.43
坌
0.43
೬
0.42
೩
0.42
кла
0.41
๕
0.41
岕
0.41
POSITIVE LOGITS
poor
0.67
poorly
0.64
unfairly
0.62
biased
0.61
less
0.60
dominated
0.59
extremist
0.59
lacks
0.58
mediocre
0.57
extreme
0.57
Activations Density 0.044%