INDEX
Explanations
instances of congratulations or expressions of achievement
New Auto-Interp
Negative Logits
MBER
-0.18
apyrus
-0.17
ansi
-0.17
Directories
-0.15
arkin
-0.15
adows
-0.15
indow
-0.14
bourg
-0.14
Gott
-0.14
anova
-0.14
POSITIVE LOGITS
regation
0.40
rats
0.38
estion
0.36
rat
0.31
reg
0.30
ested
0.29
ole
0.26
lom
0.26
reve
0.25
reso
0.24
Activations Density 0.010%