INDEX
Explanations
interesting or noteworthy information
instances of discovery and encounters with new information
New Auto-Interp
Negative Logits
charge
-0.62
breat
-0.61
Getty
-0.61
consum
-0.60
heed
-0.60
ulum
-0.59
imeters
-0.57
MI
-0.55
stood
-0.55
sis
-0.55
POSITIVE LOGITS
Ͻ
0.77
âĹ¼
0.75
wondering
0.74
NVIDIA
0.73
igslist
0.73
noticing
0.71
anew
0.69
?????-
0.69
EMOTE
0.67
surpr
0.67
Activations Density 0.248%