INDEX
Explanations
sup followed by common endings
New Auto-Interp
Negative Logits
気に
0.41
REZ
0.41
[
0.40
rène
0.40
ğun
0.40
risome
0.39
િયન
0.38
orant
0.38
ivalent
0.37
int
0.37
POSITIVE LOGITS
Sup
0.67
Supers
0.58
Sup
0.57
supers
0.56
supers
0.54
supp
0.50
sup
0.50
suppl
0.49
sup
0.49
Supp
0.46
Activations Density 0.001%