INDEX
Explanations
abbreviations and numerical data
New Auto-Interp
Negative Logits
-0.17
qu
-0.16
osi
-0.15
itz
-0.14
ider
-0.14
ulse
-0.14
871
-0.14
ungan
-0.14
Tavern
-0.14
vol
-0.14
POSITIVE LOGITS
º
0.17
lli
0.17
coli
0.15
________
0.15
okie
0.15
ICO
0.15
°
0.14
wav
0.14
tracted
0.14
yla
0.14
Activations Density 0.059%