INDEX
Explanations
phrases indicating the revelation or clarification of information
phrases about illuminating or revealing information
New Auto-Interp
Negative Logits
icians
-0.72
heid
-0.68
ournals
-0.67
teness
-0.66
erity
-0.65
DragonMagazine
-0.63
IAN
-0.61
HCR
-0.61
DonaldTrump
-0.60
£ı
-0.60
POSITIVE LOGITS
urst
0.89
iencies
0.86
ritten
0.81
ench
0.80
itten
0.78
pload
0.77
tears
0.77
irts
0.76
irth
0.76
onds
0.76
Activations Density 0.034%