INDEX
Explanations
numeric values related to episodes and quantities
New Auto-Interp
Negative Logits
ym
-0.16
agu
-0.15
agher
-0.15
sie
-0.15
bers
-0.15
age
-0.15
Ì£
-0.14
illet
-0.14
ync
-0.14
acz
-0.14
POSITIVE LOGITS
atoi
0.16
MAV
0.15
565
0.15
ỹ
0.15
amac
0.15
edla
0.15
pha
0.15
ála
0.15
lish
0.14
imulator
0.14
Activations Density 0.202%