INDEX
Explanations
terms related to damage or harm
New Auto-Interp
Negative Logits
Oasis
-0.15
gebn
-0.15
illez
-0.14
ãģ£ãģ¡
-0.14
ITA
-0.14
iteit
-0.14
linger
-0.14
zeitig
-0.14
ICT
-0.13
332
-0.13
POSITIVE LOGITS
ion
1.03
ions
0.81
ION
0.77
ioned
0.65
iona
0.63
ion
0.63
-ion
0.62
ioni
0.61
ione
0.57
Ion
0.57
Activations Density 0.042%