INDEX
Explanations
adjectives related to size and power
occurrences of the word "giant."
New Auto-Interp
Negative Logits
lest
-0.83
iring
-0.78
anwhile
-0.78
demand
-0.77
endment
-0.75
eligible
-0.75
etimes
-0.74
iency
-0.73
iggins
-0.71
tein
-0.71
POSITIVE LOGITS
squid
0.99
mammoth
0.87
elephant
0.87
giant
0.86
gorilla
0.83
titan
0.83
iceberg
0.80
penis
0.80
monster
0.79
Slayer
0.78
Activations Density 0.010%