INDEX
Explanations
references to reading or consuming content
references to reading and written material
New Auto-Interp
Negative Logits
aires
-0.79
akable
-0.74
rio
-0.67
ichick
-0.66
XL
-0.64
assador
-0.61
©¶æ¥µ
-0.61
Regist
-0.60
ãĥł
-0.60
ONSORED
-0.60
POSITIVE LOGITS
aloud
1.39
excerpts
0.95
DragonMagazine
0.91
Digest
0.89
passages
0.88
transcript
0.87
newspapers
0.86
journals
0.81
articles
0.81
transcripts
0.80
Activations Density 0.141%