INDEX
Explanations
quoted text, indicating speech or specific phrases
New Auto-Interp
Negative Logits
InView
-0.16
uman
-0.15
aran
-0.15
heimer
-0.15
Tone
-0.15
ellen
-0.14
stad
-0.14
ton
-0.14
TON
-0.14
atar
-0.14
POSITIVE LOGITS
-wide
0.23
-long
0.22
wide
0.20
diameter
0.20
wide
0.18
-radius
0.18
Rule
0.17
radius
0.16
nesota
0.16
Incre
0.16
Activations Density 0.031%