INDEX
Explanations
items or entities listed in a ranking or list format
New Auto-Interp
Negative Logits
Ended
-0.73
aeus
-0.71
gery
-0.69
ousy
-0.68
entimes
-0.66
nces
-0.65
nce
-0.63
ideon
-0.63
matter
-0.61
amiya
-0.60
POSITIVE LOGITS
list
1.58
lists
1.32
checklist
1.16
radar
1.13
LIST
1.13
blacklist
1.11
list
1.06
charts
1.03
Lists
1.03
List
0.99
Activations Density 0.162%