INDEX
Explanations
phrases indicating change or transformation in appearance and significance
New Auto-Interp
Negative Logits
ook
-0.15
at
-0.14
aba
-0.13
331
-0.13
cho
-0.13
ey
-0.13
phenomena
-0.13
Sheldon
-0.13
rex
-0.13
choice
-0.13
POSITIVE LOGITS
meaning
0.26
significance
0.24
meanings
0.23
shape
0.21
meaning
0.20
Meaning
0.20
importance
0.20
status
0.20
dimensions
0.19
characteristics
0.19
Activations Density 0.275%