INDEX
Explanations
principles, diet, phrases, office, game, heat, late, hope, happy, dust
New Auto-Interp
Negative Logits
На
0.53
К
0.50
Ко
0.50
Ка
0.48
От
0.48
он
0.47
Ли
0.47
Си
0.47
Ко
0.47
До
0.46
POSITIVE LOGITS
thorny
0.38
loopholes
0.35
one
0.34
situation
0.34
meticulously
0.33
rigorously
0.32
wording
0.32
substance
0.32
concrete
0.32
progress
0.31
Activations Density 0.008%