INDEX
Explanations
phrases or terms related to "Top" rankings or lists
headings or titles highlighted in a list format
New Auto-Interp
Negative Logits
amen
-0.79
unle
-0.68
compassion
-0.68
corridor
-0.66
rebel
-0.66
severity
-0.64
servant
-0.64
grievance
-0.63
suffering
-0.63
decree
-0.62
POSITIVE LOGITS
Top
3.76
TOP
2.37
top
2.26
Top
2.25
Bottom
1.92
TOP
1.81
top
1.69
Bottom
1.46
tops
1.45
bottom
1.43
Activations Density 0.010%