INDEX
Negative Logits
Low
0.40
Was
0.39
either
0.39
both
0.38
either
0.38
7
0.37
all
0.37
both
0.37
5
0.36
8
0.36
POSITIVE LOGITS
downright
0.50
prejudices
0.47
whatnot
0.45
addirittura
0.44
importantly
0.43
refusal
0.42
accusations
0.42
frankly
0.41
måde
0.41
fears
0.41
Activations Density 0.083%