INDEX
Explanations
phrases related to strong opinions or stances on various topics
phrases or terms that support a particular viewpoint or ideology
New Auto-Interp
Negative Logits
ĸļ
-0.91
ãĤ¼ãĤ¦ãĤ¹
-0.77
Halls
-0.68
Sins
-0.65
Gorge
-0.65
inia
-0.64
Dickens
-0.62
Twain
-0.62
Brooks
-0.60
pains
-0.60
POSITIVE LOGITS
digy
1.45
dding
1.27
actively
1.26
verbs
1.16
pelling
1.15
dig
1.11
ccess
1.10
strate
1.09
gressive
1.06
ctor
1.06
Activations Density 0.015%