INDEX
Explanations
phrases indicating a small amount or degree
mentions of the word "little" in varying contexts
New Auto-Interp
Negative Logits
eneg
-0.91
itivity
-0.77
ocity
-0.77
ources
-0.74
idents
-0.73
itures
-0.73
hester
-0.73
itars
-0.70
anwhile
-0.70
ovies
-0.70
POSITIVE LOGITS
bit
1.35
peek
0.86
helper
0.81
glimpse
0.80
tad
0.78
BIT
0.77
girl
0.75
harmless
0.74
hitter
0.72
patience
0.71
Activations Density 0.023%