INDEX
Explanations
text related to personal experiences and interactions
phrases related to personal experiences and actions
New Auto-Interp
Negative Logits
udder
-0.65
Known
-0.64
Males
-0.63
currently
-0.63
ielding
-0.63
wearer
-0.62
FIG
-0.61
SPONSORED
-0.61
Their
-0.59
forthcoming
-0.59
POSITIVE LOGITS
myself
1.04
yesterday
0.92
intending
0.76
kidding
0.75
[
0.71
aback
0.68
['
0.67
poke
0.66
netflix
0.66
ourselves
0.66
Activations Density 0.901%