INDEX
Explanations
references to personal memories or recollections
expressions of remembrance and nostalgia
New Auto-Interp
Negative Logits
Hello
-0.77
Bang
-0.65
',
-0.62
ðŁĺ
-0.61
['
-0.61
)"
-0.60
Bang
-0.58
Drinking
-0.58
Champ
-0.58
Illegal
-0.56
POSITIVE LOGITS
nonetheless
1.07
etheless
1.04
similarly
0.97
likewise
0.94
conspic
0.91
downright
0.88
equally
0.87
nevertheless
0.85
unwittingly
0.85
understandably
0.84
Activations Density 0.917%