INDEX
Explanations
the word "get" at strong activations
instances of the phrase "I get" or similar expressions indicating understanding or realization
New Auto-Interp
Negative Logits
Madness
-0.62
depiction
-0.61
ridge
-0.61
cius
-0.60
Palestin
-0.59
Archdemon
-0.59
enclosure
-0.58
iege
-0.57
enture
-0.57
annex
-0.57
POSITIVE LOGITS
rid
1.10
tin
1.02
TING
0.96
aways
0.85
bored
0.80
lucky
0.79
acquainted
0.76
terson
0.76
DragonMagazine
0.75
tired
0.73
Activations Density 0.115%