INDEX
Explanations
words related to curiosity or interest
expressions of curiosity
New Auto-Interp
Negative Logits
ohm
-0.76
ACP
-0.74
atra
-0.65
Depot
-0.65
ayers
-0.63
od
-0.62
accept
-0.62
Physicians
-0.62
oan
-0.62
Surv
-0.61
POSITIVE LOGITS
curious
1.20
curiosity
1.11
passers
0.95
Curious
0.87
onlook
0.84
iously
0.79
ioned
0.76
inqu
0.76
edIn
0.75
probing
0.74
Activations Density 0.006%