INDEX
Explanations
instances of numerical values
New Auto-Interp
Negative Logits
utherland
-0.17
LETE
-0.17
egal
-0.16
tere
-0.15
orraine
-0.15
jo
-0.14
illi
-0.14
847
-0.14
846
-0.14
dux
-0.14
POSITIVE LOGITS
Guth
0.17
porno
0.16
adle
0.16
affer
0.15
êµ°
0.15
à¹Ģà¸Ľà¸Ńร
0.15
YSIS
0.15
cart
0.14
bote
0.14
avian
0.14
Activations Density 0.000%