INDEX
Explanations
instances of expressing surprise about something being observed for the first time
expressions of novelty or unique experiences
New Auto-Interp
Negative Logits
externalActionCode
-0.68
ials
-0.66
recy
-0.61
ayers
-0.59
Shape
-0.58
bi
-0.56
Charge
-0.56
roup
-0.55
uli
-0.55
refin
-0.55
POSITIVE LOGITS
anything
0.93
anybody
0.92
anyone
0.84
anything
0.77
anywhere
0.76
dime
0.74
ANY
0.74
nor
0.69
bothered
0.68
any
0.67
Activations Density 0.069%