INDEX
Explanations
items related to rankings or positions
New Auto-Interp
Negative Logits
facts
-0.76
evidence
-0.71
Enjoy
-0.66
Contents
-0.65
letters
-0.65
guards
-0.64
Dates
-0.64
Amend
-0.64
rules
-0.62
lest
-0.62
POSITIVE LOGITS
standpoint
1.41
vantage
1.15
perspective
1.08
distance
0.97
viewpoint
0.95
variety
0.87
helicopter
0.79
outset
0.79
perspect
0.78
POV
0.76
Activations Density 0.049%