INDEX
Explanations
mentions of names that start with "Ke" followed by a variety of letters
occurrences of specific names or identifiers
New Auto-Interp
Negative Logits
Flavoring
-0.80
channelAvailability
-0.74
ashtra
-0.73
catentry
-0.68
illeg
-0.66
smuggled
-0.63
DRAG
-0.60
specificity
-0.60
Lucius
-0.60
SHIP
-0.60
POSITIVE LOGITS
bye
0.78
aston
0.77
chuk
0.75
pie
0.74
ury
0.72
warm
0.71
lein
0.70
lyn
0.69
uke
0.68
kens
0.67
Activations Density 0.060%