INDEX
Explanations
terms related to martial arts, specifically those associated with Kung Fu
New Auto-Interp
Negative Logits
hra
-0.16
hn
-0.16
chn
-0.15
iti
-0.15
vable
-0.15
ξη
-0.15
izado
-0.15
etz
-0.15
hong
-0.14
ethe
-0.14
POSITIVE LOGITS
sten
0.19
lasses
0.19
su
0.18
arian
0.16
uestion
0.16
flen
0.16
lish
0.16
aroo
0.15
tol
0.15
kas
0.15
Activations Density 0.007%