INDEX
Explanations
words related to confirming or agreeing with a statement or belief
terms indicating affirmation or endorsement
New Auto-Interp
Negative Logits
devices
-0.72
Ox
-0.72
hy
-0.68
OTOS
-0.66
sites
-0.66
bags
-0.65
ONES
-0.65
Spoiler
-0.65
hner
-0.63
Gorge
-0.62
POSITIVE LOGITS
affirm
1.37
irmation
1.31
affirmation
1.31
irming
1.26
irmed
1.26
irm
1.09
irms
1.03
atively
0.98
reaff
0.92
acceptance
0.84
Activations Density 0.012%