INDEX
Explanations
names and mentions of influential individuals or figures in various contexts
New Auto-Interp
Negative Logits
hash
-0.15
hang
-0.15
hint
-0.15
handle
-0.13
hack
-0.13
ilogy
-0.12
handles
-0.12
ìĨ
-0.12
hide
-0.12
hello
-0.12
POSITIVE LOGITS
-H
0.70
ÂłH
0.70
(H
0.67
_H
0.66
H
0.63
HC
0.60
,H
0.60
HH
0.59
HB
0.58
HL
0.57
Activations Density 1.018%