INDEX
Explanations
proper nouns related to individuals and events
New Auto-Interp
Negative Logits
Pigs
-0.77
steel
-0.64
arine
-0.64
tomorrow
-0.62
pite
-0.61
Runs
-0.61
geries
-0.60
wake
-0.60
advertisement
-0.60
rating
-0.60
POSITIVE LOGITS
embarked
0.93
resorted
0.89
ventured
0.87
inexpl
0.86
encountered
0.82
reverted
0.81
confidently
0.81
decided
0.80
quietly
0.80
systematically
0.80
Activations Density 4.123%