INDEX
Explanations
words related to power, strength, and performance
terms related to damage, power, performance, and other metrics of effectiveness in various contexts
New Auto-Interp
Negative Logits
creen
-0.75
ovie
-0.69
uthor
-0.66
Tale
-0.65
Rue
-0.65
igl
-0.62
Friendship
-0.62
Vote
-0.61
Beck
-0.60
ournal
-0.59
POSITIVE LOGITS
compared
0.87
capability
0.81
iencies
0.80
efficiency
0.79
output
0.77
advantages
0.77
capabilities
0.77
advantage
0.75
requirements
0.74
destro
0.74
Activations Density 0.241%