INDEX
Explanations
references to the Hunger Games series and its characters
New Auto-Interp
Negative Logits
phalt
-0.16
otel
-0.15
oton
-0.15
nors
-0.15
umer
-0.15
вав
-0.14
جاد
-0.14
718
-0.14
Sheet
-0.14
æ¥
-0.14
POSITIVE LOGITS
Hunger
0.34
hunger
0.24
districts
0.23
Mock
0.23
District
0.21
District
0.21
MOCK
0.20
mocking
0.19
Mock
0.19
district
0.19
Activations Density 0.037%