INDEX
Explanations
concepts related to misconceptions and fallacies in reasoning
New Auto-Interp
Negative Logits
purpoſe
-0.77
متعلقه
-0.76
Diwedd
-0.73
houſe
-0.72
itſelf
-0.69
ſelf
-0.68
Jefus
-0.67
uſe
-0.66
حياتها
-0.66
ſelves
-0.66
POSITIVE LOGITS
misunder
0.59
often
0.59
wrongly
0.54
wrong
0.54
ignor
0.53
Often
0.53
mistaken
0.53
misunderstand
0.51
sometimes
0.50
incorrect
0.50
Activations Density 0.454%