INDEX
Explanations
text enclosed in quotation marks
instances of parentheses or quotes that indicate citations or references
New Auto-Interp
Negative Logits
orate
-0.67
RC
-0.66
Marble
-0.63
Nunes
-0.62
Hodg
-0.60
Mills
-0.60
EVs
-0.59
ESV
-0.59
TTL
-0.58
Kin
-0.58
POSITIVE LOGITS
catentry
0.97
wcsstore
0.84
pes
0.77
onduct
0.74
rag
0.71
ourage
0.71
pine
0.70
tnc
0.69
iquid
0.67
insula
0.67
Activations Density 0.019%