INDEX
Explanations
exclamatory expressions
exclamatory statements or expressions of enthusiasm
New Auto-Interp
Negative Logits
manif
-0.84
destro
-0.81
tyr
-0.77
glim
-0.76
restraints
-0.74
surpr
-0.72
metic
-0.70
arrang
-0.70
charact
-0.69
dams
-0.68
POSITIVE LOGITS
@#&
1.38
#$
1.15
@#
0.93
?!
0.93
important
0.91
ctory
0.81
olkien
0.72
?,
0.71
Shine
0.71
~
0.70
Activations Density 0.069%