INDEX
Explanations
words related to strong opinions or controversial topics
words related to expressions of emotions or states of being
New Auto-Interp
Negative Logits
ĻĤ
-0.65
Pengu
-0.62
oret
-0.61
Scriptures
-0.60
intest
-0.59
sag
-0.59
kilograms
-0.59
Spac
-0.58
Wonders
-0.58
fixme
-0.57
POSITIVE LOGITS
iness
0.93
mong
0.92
ously
0.88
naires
0.87
naire
0.83
seekers
0.83
zzle
0.79
oused
0.78
flies
0.78
bub
0.76
Activations Density 0.104%