INDEX
Explanations
proper nouns or names starting with a capital letter
elements related to rankings and hierarchical positions
New Auto-Interp
Negative Logits
γ
-0.82
avan
-0.80
interf
-0.72
react
-0.71
auga
-0.69
displayText
-0.69
iaz
-0.69
storms
-0.67
hran
-0.66
anol
-0.65
POSITIVE LOGITS
Top
2.04
top
1.87
TOP
1.82
Top
1.76
TOP
1.65
ranking
1.62
top
1.62
ranked
1.59
Bottom
1.53
rankings
1.50
Activations Density 0.253%