INDEX
Explanations
significant verbs or actions related to experiences and outcomes
New Auto-Interp
Negative Logits
151
-0.16
imore
-0.15
ibal
-0.15
elligence
-0.15
kinh
-0.14
getline
-0.14
ibir
-0.14
lÃŃÄį
-0.14
isay
-0.14
illon
-0.13
POSITIVE LOGITS
allee
0.14
envelopes
0.14
PFN
0.14
_HERSHEY
0.14
alt
0.14
612
0.14
envelope
0.14
ets
0.14
ode
0.14
.libs
0.14
Activations Density 0.039%