INDEX
Explanations
instances of the word "ols" with varying activation levels
mentions of "pistols" in various contexts
New Auto-Interp
Negative Logits
ŃĶ
-0.72
Rapp
-0.70
office
-0.66
liest
-0.62
FACE
-0.58
Pentagon
-0.58
French
-0.57
Chair
-0.57
Kenyan
-0.57
PRESS
-0.56
POSITIVE LOGITS
ols
1.39
olics
1.04
ength
1.00
terday
0.99
ongs
0.91
olic
0.90
atile
0.90
ands
0.89
ipop
0.88
allery
0.85
Activations Density 0.010%