INDEX
Explanations
adjectives describing qualities or characteristics
New Auto-Interp
Negative Logits
HAEL
-0.74
ULTS
-0.71
anwhile
-0.67
interrupted
-0.65
destro
-0.63
lished
-0.63
kefeller
-0.61
ELL
-0.61
BAT
-0.60
arks
-0.60
POSITIVE LOGITS
entially
0.84
able
0.79
ative
0.78
ãĤ¦ãĤ¹
0.77
ically
0.74
liness
0.72
oscope
0.70
phabet
0.70
abouts
0.69
hing
0.68
Activations Density 0.015%