INDEX
Explanations
scientific research-related terms and data
New Auto-Interp
Negative Logits
istry
-0.59
fencing
-0.53
nationalists
-0.52
drowning
-0.51
bleeding
-0.51
whales
-0.50
nationalism
-0.49
thinking
-0.49
sed
-0.48
pastoral
-0.48
POSITIVE LOGITS
120
0.71
134
0.67
200
0.67
162
0.66
113
0.65
114
0.65
148
0.65
145
0.65
146
0.64
194
0.63
Activations Density 13.222%