INDEX
Explanations
references to individuals and their roles in societal or systemic contexts
New Auto-Interp
Negative Logits
ÑĦон
-0.14
âĪı
-0.14
orex
-0.14
utz
-0.14
üme
-0.14
atz
-0.14
Rib
-0.13
Fon
-0.13
dorf
-0.13
/REC
-0.13
POSITIVE LOGITS
being
0.24
being
0.19
Being
0.17
à¸į
0.15
CellValue
0.15
Being
0.15
ãĥĹãĥ¬
0.14
ooke
0.14
having
0.14
Spl
0.14
Activations Density 0.353%