INDEX
Explanations
phrases related to expressing dissatisfaction or disapproval
instances of words related to the concept of meaning or significance
New Auto-Interp
Negative Logits
shire
-0.72
Compass
-0.67
lines
-0.64
liners
-0.64
ython
-0.63
Rabbit
-0.60
Pixie
-0.60
Anthropology
-0.59
rag
-0.58
enegger
-0.58
POSITIVE LOGITS
ctions
1.05
fters
1.04
pered
1.03
fter
1.02
vers
0.99
ction
0.99
isance
0.98
uthor
0.98
vered
0.97
BILITY
0.97
Activations Density 0.109%