INDEX
Explanations
statements of fact or truth
New Auto-Interp
Negative Logits
itz
-0.16
ë²
-0.14
urt
-0.14
rape
-0.14
urf
-0.14
elsea
-0.13
osos
-0.13
bers
-0.13
elve
-0.13
issa
-0.13
POSITIVE LOGITS
ually
0.17
heed
0.15
orial
0.15
etal
0.15
oad
0.14
orer
0.14
ekil
0.14
ViewById
0.14
ATHER
0.13
пÑĢоÑĩ
0.13
Activations Density 0.012%