INDEX
Explanations
references to the word "congratulations" and its variations
New Auto-Interp
Negative Logits
icip
-0.15
ogl
-0.15
iera
-0.15
icias
-0.15
Ø«
-0.14
Cab
-0.14
etz
-0.14
icrous
-0.14
AsString
-0.14
θη
-0.14
POSITIVE LOGITS
estion
0.26
regation
0.21
Cong
0.19
rats
0.19
Cong
0.19
ional
0.18
ault
0.16
hton
0.16
ado
0.15
uis
0.15
Activations Density 0.015%