INDEX
Explanations
the repeated use of the word "hal" in various contexts
New Auto-Interp
Negative Logits
ollen
-0.18
escal
-0.17
y
-0.16
ÑįÑĤ
-0.16
i
-0.15
esen
-0.15
Gibbs
-0.15
ess
-0.15
ups
-0.15
ña
-0.14
POSITIVE LOGITS
ting
0.24
ifax
0.23
stead
0.23
ftime
0.23
ogen
0.23
cy
0.22
oreach
0.20
ogens
0.20
ibur
0.20
vor
0.19
Activations Density 0.008%