INDEX
Explanations
references to the word "this" in various contexts
New Auto-Interp
Negative Logits
uga
-0.15
min
-0.14
outright
-0.14
{:.-0.13
lig
-0.13
знаком
-0.13
ter
-0.13
vice
-0.13
resident
-0.13
tight
-0.13
POSITIVE LOGITS
->
0.38
->_
0.32
->___
0.32
->
0.22
::$
0.21
->$
0.21
->__
0.21
-&
0.20
->{0.20
->{$0.19
Activations Density 0.004%