INDEX
Explanations
mentions of official statements or reports
New Auto-Interp
Negative Logits
Constructed
-0.94
龍喚士
-0.88
soType
-0.86
Reviewed
-0.86
Factor
-0.83
Winner
-0.82
◼
-0.81
Rated
-0.78
Alias
-0.77
Owner
-0.77
POSITIVE LOGITS
butterflies
0.77
memo
0.72
labs
0.72
books
0.69
monitoring
0.67
monkeys
0.67
commentary
0.66
squirrel
0.66
paper
0.65
forests
0.65
Activations Density 0.085%