INDEX
Explanations
instances of the word "wrong" along with related terms
instances of the word "wrong" and its variations used to describe mistakes or injustices
New Auto-Interp
Negative Logits
enance
-0.71
Ri
-0.68
ILA
-0.67
tsky
-0.63
Chill
-0.61
Swim
-0.59
indemn
-0.58
unavailable
-0.56
hew
-0.55
hyde
-0.55
POSITIVE LOGITS
headed
1.49
fully
1.47
do
1.12
doing
1.08
fulness
1.06
footed
1.00
ful
0.97
behavior
0.85
ed
0.85
eous
0.84
Activations Density 0.045%