INDEX
Explanations
instances of the term "Fro" in various forms
New Auto-Interp
Negative Logits
ascal
-0.16
uc
-0.15
rose
-0.15
Cush
-0.15
CHA
-0.15
ifu
-0.15
olds
-0.15
atron
-0.15
roma
-0.14
vers
-0.14
POSITIVE LOGITS
sted
0.21
Fro
0.21
fro
0.18
eyle
0.17
icken
0.17
issement
0.17
heten
0.16
bite
0.16
izzy
0.16
lick
0.16
Activations Density 0.008%