INDEX
Explanations
phrases related to congratulations or celebratory expressions
New Auto-Interp
Negative Logits
uce
-0.19
conf
-0.16
ÑĪев
-0.15
lopen
-0.14
ipes
-0.14
oxy
-0.14
-*-č↵
-0.14
uw
-0.14
berger
-0.14
ansi
-0.14
POSITIVE LOGITS
rats
0.32
estion
0.29
regation
0.26
Cong
0.25
Cong
0.23
rat
0.20
ested
0.19
erville
0.17
cong
0.17
ole
0.17
Activations Density 0.007%