INDEX
Explanations
references to Irish culture or characters
New Auto-Interp
Negative Logits
dana
-0.17
entin
-0.16
etas
-0.15
emer
-0.15
evil
-0.15
ilded
-0.14
enne
-0.14
Prefs
-0.14
icious
-0.14
icator
-0.14
POSITIVE LOGITS
regular
0.24
irr
0.21
ving
0.20
replace
0.19
Ir
0.19
ir
0.19
Irr
0.19
ritable
0.17
iom
0.17
ises
0.16
Activations Density 0.010%