INDEX
Explanations
names or parts of names with "hel" in them
instances of the word "hel."
New Auto-Interp
Negative Logits
Rated
-0.72
nomine
-0.69
entangled
-0.63
-0.62
Flight
-0.62
fierce
-0.61
famous
-0.58
NTS
-0.58
GGGG
-0.58
EED
-0.57
POSITIVE LOGITS
tered
1.12
mand
1.00
iflower
0.96
ters
0.92
itism
0.91
mes
0.90
brook
0.89
tering
0.88
pless
0.87
iman
0.85
Activations Density 0.009%