INDEX
Explanations
occurrences of the word "hit" in various contexts
New Auto-Interp
Negative Logits
hire
-0.19
ered
-0.19
iac
-0.17
erre
-0.16
erge
-0.16
ths
-0.15
erie
-0.15
eren
-0.15
hand
-0.15
sed
-0.15
POSITIVE LOGITS
TING
0.30
ting
0.28
achi
0.24
ACHI
0.21
parade
0.19
ler
0.19
REC
0.18
maker
0.17
/goto
0.17
lers
0.17
Activations Density 0.020%