INDEX
Explanations
mentions of tigers and related terms
New Auto-Interp
Negative Logits
abilit
-0.18
ategory
-0.15
gens
-0.15
tridge
-0.15
agma
-0.14
letcher
-0.14
ataka
-0.14
çĭIJ
-0.14
orative
-0.14
YTE
-0.14
POSITIVE LOGITS
Woods
0.29
cub
0.26
Cub
0.22
Tiger
0.22
Claw
0.21
Cubs
0.21
claw
0.20
woods
0.19
hawk
0.19
Paw
0.18
Activations Density 0.009%