INDEX
Explanations
adjective-noun pairs indicating satisfaction or acceptance
variations of the word "content."
New Auto-Interp
Negative Logits
rolet
-0.71
damn
-0.63
Äĩ
-0.61
Siem
-0.60
udeb
-0.59
DAM
-0.59
ipers
-0.59
iami
-0.59
Dee
-0.58
JD
-0.57
POSITIVE LOGITS
edly
1.44
ment
1.06
ioned
0.99
ions
0.91
iar
0.85
content
0.82
icut
0.81
onite
0.80
Content
0.80
ication
0.79
Activations Density 0.020%