INDEX
Explanations
phrases indicating size or magnitude
New Auto-Interp
Negative Logits
roma
-0.15
sund
-0.15
igne
-0.15
ugo
-0.14
Bucc
-0.13
ittest
-0.13
/Peak
-0.13
ovo
-0.13
andin
-0.13
kans
-0.13
POSITIVE LOGITS
list
0.19
number
0.17
amount
0.17
portion
0.16
retty
0.15
umber
0.15
amount
0.15
range
0.15
set
0.14
Ctrls
0.14
Activations Density 0.166%