INDEX
Explanations
formatted block structures or sections that organize textual information
New Auto-Interp
Negative Logits
lain
-0.83
McDonnell
-0.71
Cameron
-0.68
funn
-0.68
Xavier
-0.65
alys
-0.65
terday
-0.65
bearer
-0.65
mallow
-0.63
Cumber
-0.61
POSITIVE LOGITS
------------
1.35
----------------------------------------------------------------
1.35
--------
1.33
----------------
1.31
-----------
1.31
------------------------------------------------
1.28
------------------------
1.28
-------
1.27
--------------
1.26
---------------
1.25
Activations Density 0.004%