INDEX
Explanations
repetitive phrases indicating reasons or justifications
New Auto-Interp
Negative Logits
saites
-0.45
wertig
-0.40
Савезне
-0.38
autogui
-0.36
Biôgrafia
-0.36
StoryboardSegue
-0.35
GEBURTS
-0.35
виправивши
-0.35
SAID
-0.34
↕
-0.34
POSITIVE LOGITS
example
0.86
reasons
0.78
anyone
0.78
obvious
0.78
whatever
0.76
purposes
0.75
unately
0.70
instance
0.68
obvious
0.68
anybody
0.65
Activations Density 0.271%