INDEX
Explanations
phrases related to importance or significance
words related to the importance or significance of events, actions, or concepts
New Auto-Interp
Negative Logits
resp
-0.71
Balanced
-0.67
Bull
-0.66
cker
-0.66
uv
-0.64
ergy
-0.63
hma
-0.61
Jet
-0.61
IDES
-0.61
Rail
-0.60
POSITIVE LOGITS
significance
1.33
importance
1.05
notation
0.93
uality
0.93
relevance
0.92
notations
0.92
implications
0.86
xual
0.84
proble
0.84
atility
0.81
Activations Density 0.005%