INDEX
Explanations
abstract concepts related to formality and structure
terms related to forms and characteristics of reality
New Auto-Interp
Negative Logits
kers
-0.75
vier
-0.74
fman
-0.74
Ͻ
-0.73
ERC
-0.71
secut
-0.70
kef
-0.69
ergy
-0.69
rompt
-0.68
vag
-0.67
POSITIVE LOGITS
istically
0.74
atsu
0.70
butt
0.68
heads
0.66
Explosion
0.64
hound
0.64
ativity
0.63
Cloak
0.63
ipop
0.61
aries
0.61
Activations Density 0.019%