INDEX
Explanations
lists of items
the word "list" in various contexts
New Auto-Interp
Negative Logits
selves
-0.72
whist
-0.65
Aber
-0.64
rhyth
-0.63
leisure
-0.60
Trade
-0.60
Train
-0.59
hearts
-0.59
vertisement
-0.58
recess
-0.58
POSITIVE LOGITS
erv
1.04
lists
0.81
witz
0.80
listings
0.79
listing
0.78
erve
0.77
list
0.77
lists
0.75
uably
0.75
criteria
0.75
Activations Density 0.027%