INDEX
Explanations
English words
instances of the word "En"
New Auto-Interp
Negative Logits
PAC
-0.75
stretch
-0.74
remission
-0.69
AMA
-0.69
pound
-0.68
bonding
-0.68
circulation
-0.68
sport
-0.66
sports
-0.65
summer
-0.64
POSITIVE LOGITS
En
3.39
Enc
1.58
Ut
1.28
EN
1.27
Armor
1.24
Il
1.20
Requ
1.19
Ob
1.18
Inst
1.17
Eng
1.17
Activations Density 0.010%