INDEX
Explanations
physical actions or movements
references to people
New Auto-Interp
Negative Logits
shire
-0.63
distances
-0.61
attribution
-0.59
Deal
-0.59
details
-0.57
falsehood
-0.57
mirrors
-0.57
EDITION
-0.57
Ended
-0.55
Belfast
-0.54
POSITIVE LOGITS
formance
1.34
cking
1.32
gging
1.28
eping
1.27
eking
1.26
eps
1.24
pperc
1.19
eling
1.17
ppy
1.14
aking
1.12
Activations Density 0.020%