INDEX
Explanations
instances of the word "correct" and related terms indicating accuracy or truthfulness
New Auto-Interp
Negative Logits
holder
-0.19
ISTIC
-0.17
klady
-0.17
-0.17
istic
-0.16
ittings
-0.16
holders
-0.16
getic
-0.16
jective
-0.16
ITY
-0.15
POSITIVE LOGITS
ively
0.66
ive
0.49
ors
0.46
iveness
0.44
ives
0.43
ivity
0.40
ual
0.35
ible
0.33
eur
0.33
ually
0.32
Activations Density 0.155%