INDEX
Explanations
various forms and derivatives of the word "fire."
New Auto-Interp
Negative Logits
ĸļ
-0.79
sylv
-0.77
iple
-0.73
onent
-0.72
ociated
-0.69
imore
-0.69
insula
-0.67
ermott
-0.66
iances
-0.66
acca
-0.66
POSITIVE LOGITS
lli
1.35
nces
1.21
tta
1.17
tto
1.13
lda
1.05
nder
1.04
lla
1.02
llo
0.99
nce
0.98
ttes
0.96
Activations Density 0.012%