INDEX
Explanations
information sources or attributions in a text
citations or references to sources
New Auto-Interp
Negative Logits
monster
-0.71
choice
-0.70
mood
-0.70
turn
-0.68
pain
-0.68
affair
-0.66
act
-0.66
performance
-0.66
accent
-0.66
perfectly
-0.65
POSITIVE LOGITS
Sources
4.27
Sources
2.44
References
2.13
Source
1.75
sources
1.51
ources
1.48
Resources
1.39
Appearances
1.38
Images
1.36
SOURCE
1.35
Activations Density 0.010%