INDEX
Negative Logits
duh
0.47
Paw
0.43
arbeitet
0.43
grund
0.42
Flor
0.42
rawler
0.41
deen
0.41
car
0.41
Ri
0.41
ür
0.40
POSITIVE LOGITS
difference
0.51
differences
0.50
διαφο
0.49
disclosure
0.48
additional
0.46
Differ
0.46
orig
0.45
difer
0.45
डिफर
0.44
额外的
0.44
Activations Density 0.001%