INDEX
Explanations
statements where someone is being validated or confirmed as correct
statements affirming correctness or agreement
New Auto-Interp
Negative Logits
earance
-0.65
Gong
-0.62
cano
-0.62
irts
-0.62
battle
-0.61
gery
-0.60
nets
-0.60
stability
-0.59
untu
-0.59
ains
-0.59
POSITIVE LOGITS
footed
0.94
eous
0.84
sighted
0.78
terday
0.72
Osw
0.72
ignorant
0.68
smack
0.68
fully
0.67
utherford
0.66
eyed
0.65
Activations Density 0.069%