INDEX
Explanations
words related to geographical locations or proper nouns
references to fiber-related terms and entities
New Auto-Interp
Negative Logits
pity
-0.69
undone
-0.62
isters
-0.61
ĸļ
-0.60
olulu
-0.58
abortion
-0.57
Shack
-0.57
nont
-0.57
Mosul
-0.57
ath
-0.57
POSITIVE LOGITS
iber
1.26
tarian
1.07
glass
1.01
tarians
0.90
glas
0.86
ilib
0.83
oad
0.81
JJ
0.80
culus
0.79
ãĥ¼ãĤ¯
0.78
Activations Density 0.005%