INDEX
Explanations
phrases related to comparisons or contrasts
phrases indicating durations or time intervals
New Auto-Interp
Negative Logits
butterflies
-0.76
lifes
-0.76
advoc
-0.73
ĵĺ
-0.72
rons
-0.72
strugg
-0.70
¬¼
-0.70
cons
-0.69
ysis
-0.69
basil
-0.68
POSITIVE LOGITS
_-
0.89
gpu
0.79
SOURCE
0.76
âĸº
0.74
/-
0.73
[[
0.72
yes
0.72
->
0.71
BUT
0.71
why
0.70
Activations Density 0.037%