INDEX
Explanations
text related to card games such as Pokemon
New Auto-Interp
Negative Logits
ging
-0.84
hran
-0.68
Fighter
-0.68
cheon
-0.64
given
-0.63
rency
-0.62
feeding
-0.61
izabeth
-0.60
comes
-0.60
lier
-0.60
POSITIVE LOGITS
haps
1.04
optionally
0.98
hap
0.95
onna
0.93
be
0.91
contain
0.87
vary
0.84
differ
0.82
confuse
0.81
derive
0.81
Activations Density 0.050%