INDEX
Explanations
proper nouns referring to specific entities like people, places, and organizations
references to groups of people and examples of their experiences or behaviors
New Auto-Interp
Negative Logits
heid
-0.65
enium
-0.65
ysics
-0.62
otiation
-0.61
astern
-0.61
blast
-0.61
uild
-0.60
ustomed
-0.60
ulk
-0.60
]+
-0.58
POSITIVE LOGITS
notable
0.95
include
0.84
noteworthy
0.83
orthy
0.82
involves
0.78
highlights
0.78
includes
0.78
included
0.72
%:
0.71
however
0.69
Activations Density 0.381%