INDEX
Explanations
words related to mental states or activities related to awareness and perception
references to consciousness
New Auto-Interp
Negative Logits
PM
-0.70
Schne
-0.68
ctic
-0.67
rough
-0.66
Naz
-0.65
GER
-0.63
rug
-0.62
Rough
-0.61
unpublished
-0.61
bor
-0.59
POSITIVE LOGITS
consciousness
1.17
Conscious
1.04
conscious
0.98
jriwal
0.98
awareness
0.94
ynes
0.91
edly
0.91
oenix
0.86
ibility
0.86
ysis
0.85
Activations Density 0.009%