INDEX
Explanations
references to the term "Ba" with varying activations
the occurrences of the name "Ba" in various contexts
New Auto-Interp
Negative Logits
lessly
-0.88
wise
-0.82
otle
-0.77
ments
-0.72
lessness
-0.72
matic
-0.71
mented
-0.68
geist
-0.67
LAND
-0.67
ITAL
-0.66
POSITIVE LOGITS
atar
0.99
uble
0.99
Ba
0.91
uman
0.86
iley
0.84
umann
0.84
ison
0.82
plin
0.81
ild
0.81
iting
0.80
Activations Density 0.012%