INDEX
Explanations
adjectives related to strength or power
instances of the word "strong" in various contexts
New Auto-Interp
Negative Logits
externalToEVAOnly
-0.83
Sloan
-0.74
Canaver
-0.73
Newly
-0.72
kay
-0.72
Journals
-0.72
Correction
-0.68
Incarnation
-0.68
apolis
-0.67
phis
-0.67
POSITIVE LOGITS
enough
1.03
enough
0.98
nesses
0.96
ener
0.87
winds
0.85
hitter
0.83
deterrent
0.77
sword
0.76
promot
0.74
ament
0.74
Activations Density 0.024%