INDEX
Explanations
words related to understanding and affirmation
words that indicate confirmation, affirmation, or agreement
New Auto-Interp
Negative Logits
fml
-0.68
chrom
-0.67
jri
-0.67
fuss
-0.66
challeng
-0.66
awa
-0.65
elf
-0.65
flix
-0.62
pid
-0.62
esm
-0.61
POSITIVE LOGITS
ments
1.89
ment
1.83
ations
1.63
ation
1.46
ement
1.42
ably
1.42
MENT
1.41
MENTS
1.36
ements
1.34
ATIONS
1.31
Activations Density 0.143%