INDEX
Explanations
mentions of crying or emotions related to crying
New Auto-Interp
Negative Logits
MG
-0.63
eties
-0.61
ratulations
-0.61
aughs
-0.60
velength
-0.59
contrace
-0.59
auga
-0.59
ilibrium
-0.58
awaru
-0.58
atively
-0.56
POSITIVE LOGITS
stals
1.39
baby
1.26
stall
1.13
ogenic
0.99
sis
0.99
ogen
0.98
stal
0.96
pter
0.95
onics
0.92
wolf
0.90
Activations Density 0.040%