INDEX
Explanations
references to cats or cat-related topics
New Auto-Interp
Negative Logits
orer
-0.19
steen
-0.18
ourt
-0.17
gaard
-0.17
Atmos
-0.17
ureka
-0.16
Citadel
-0.16
hl
-0.16
icana
-0.15
iola
-0.15
POSITIVE LOGITS
apult
0.33
égorie
0.29
nip
0.28
fish
0.23
elog
0.21
amar
0.20
enary
0.20
amount
0.20
ting
0.20
walk
0.20
Activations Density 0.014%