INDEX
Explanations
occurrences of the substring "ak" in various contexts
New Auto-Interp
Negative Logits
ingham
-0.93
Accountability
-0.67
builders
-0.66
bury
-0.64
Walls
-0.63
enegger
-0.63
ãĤ£
-0.63
feeding
-0.62
Stim
-0.61
laureate
-0.61
POSITIVE LOGITS
ansas
0.84
AY
0.84
rily
0.81
umar
0.80
ANE
0.80
ril
0.79
ernels
0.78
orea
0.76
lein
0.76
atana
0.74
Activations Density 0.008%