INDEX
Explanations
the name "Bab" or variations of it specified in the activations
occurrences of the name "Bab" and its variations
New Auto-Interp
Negative Logits
backer
-0.79
PORT
-0.73
ICES
-0.73
IELD
-0.69
VICE
-0.69
dfx
-0.69
OPLE
-0.68
ICE
-0.68
MENT
-0.68
Lauder
-0.67
POSITIVE LOGITS
alon
1.02
oru
0.92
cock
0.91
ulin
0.90
raham
0.89
Bab
0.87
oard
0.87
ule
0.86
aret
0.86
ulet
0.84
Activations Density 0.018%