INDEX
Explanations
negatively connotated words or phrases
occurrences of the prefix "unf," indicating negation or absence
New Auto-Interp
Negative Logits
gorilla
-0.68
ãĥ£
-0.67
gas
-0.65
partName
-0.65
buckle
-0.65
gear
-0.65
Dag
-0.64
Madness
-0.64
Kry
-0.63
simulations
-0.62
POSITIVE LOGITS
riend
1.62
athom
1.62
ortunately
1.50
ortun
1.49
avour
1.46
aith
1.41
azed
1.34
rozen
1.32
ashion
1.32
avored
1.31
Activations Density 0.014%