INDEX
Explanations
mentions of cheering or expressions of joy
instances of the word "cheer" and related forms
New Auto-Interp
Negative Logits
ngth
-0.76
arin
-0.72
enture
-0.68
ahime
-0.67
uture
-0.67
orgetown
-0.65
sequest
-0.64
fung
-0.62
wedge
-0.62
ighth
-0.61
POSITIVE LOGITS
leader
1.24
leading
1.19
cheer
1.17
leaders
1.11
fulness
1.04
cheering
1.01
cheered
0.99
lead
0.95
cheers
0.87
fully
0.86
Activations Density 0.010%