INDEX
Explanations
mentions of the word "mom"
mentions of the term "mom"
New Auto-Interp
Negative Logits
isson
-0.70
Flavoring
-0.67
indal
-0.66
vernment
-0.66
protected
-0.64
etting
-0.62
IGHTS
-0.61
Topics
-0.60
UCK
-0.60
Tribunal
-0.59
POSITIVE LOGITS
hesis
1.13
mom
1.11
my
0.98
Mom
0.96
ma
0.91
dad
0.90
wife
0.89
Mom
0.86
mom
0.84
mers
0.82
Activations Density 0.010%