INDEX
Explanations
suggesting options and examples
New Auto-Interp
Negative Logits
rohkem
0.48
manejar
0.39
bullshit
0.38
StringSet
0.38
be
0.37
mettre
0.37
die
0.36
unobstructed
0.36
interfere
0.36
ك
0.36
POSITIVE LOGITS
甚至
0.59
मसलन
0.54
thậm
0.51
навіть
0.51
например
0.49
even
0.49
기준으로
0.48
เช่น
0.48
даже
0.48
甚至是
0.48
Activations Density 0.022%