INDEX
Explanations
phrases related to additions or new elements being introduced
instances of the word "addition" in various contexts
New Auto-Interp
Negative Logits
zh
-0.70
zees
-0.67
rior
-0.65
raz
-0.65
yah
-0.63
bis
-0.62
zi
-0.62
zee
-0.61
walking
-0.61
mos
-0.60
POSITIVE LOGITS
endum
0.99
xual
0.86
ition
0.84
itious
0.80
verted
0.78
Flavoring
0.76
thereto
0.73
xon
0.73
insult
0.71
bonus
0.69
Activations Density 0.020%