INDEX
Explanations
proper nouns or names embedded in statements
statements of opinion or commentary on various topics
New Auto-Interp
Negative Logits
actionGroup
-0.79
MpServer
-0.76
artifacts
-0.74
*/
-0.72
*/
-0.71
warr
-0.70
icter
-0.70
FTWARE
-0.69
Afgh
-0.68
corruption
-0.67
POSITIVE LOGITS
quotes
1.05
reply
1.04
referring
0.98
replied
0.97
quotation
0.96
quote
0.94
replies
0.94
echoed
0.94
yrics
0.92
quoting
0.91
Activations Density 0.666%