INDEX
Explanations
the word "fire" or variants of it
instances of the word "fire."
New Auto-Interp
Negative Logits
ļé
-0.82
ĸļ
-0.82
carbohyd
-0.73
achusetts
-0.71
cellul
-0.70
Īè
-0.69
romeda
-0.69
nont
-0.68
srf
-0.68
ockets
-0.67
POSITIVE LOGITS
nces
1.00
cia
0.90
lli
0.86
ly
0.86
lessly
0.83
ttes
0.81
Dame
0.79
zza
0.79
les
0.79
tto
0.78
Activations Density 0.014%