INDEX
Explanations
dates or references to historical or religious events
references to episodes or segments in a series
New Auto-Interp
Negative Logits
FUL
-0.98
LESS
-0.87
veyard
-0.73
^^^^
-0.72
Pru
-0.72
ARDS
-0.70
hips
-0.69
WAYS
-0.69
âĺĨ
-0.69
''''
-0.69
POSITIVE LOGITS
iphany
1.35
iscopal
1.23
istle
1.21
hemer
1.18
isodes
1.12
onymous
1.01
iph
1.00
igen
0.99
isc
0.98
oton
0.97
Activations Density 0.012%