INDEX
Explanations
phrases indicating setting or location
phrases indicating scheduled events or releases
New Auto-Interp
Negative Logits
Pastebin
-0.62
luence
-0.61
alez
-0.60
orean
-0.59
loo
-0.59
ocations
-0.57
00000
-0.57
mathemat
-0.55
illian
-0.55
udeau
-0.55
POSITIVE LOGITS
tle
1.06
abl
1.01
aside
0.90
sail
0.90
forth
0.87
ters
0.85
tering
0.83
worms
0.83
upt
0.81
list
0.80
Activations Density 0.034%