INDEX
Explanations
specific symbols
instances of numerical values or counts in the text
New Auto-Interp
Negative Logits
thrill
-0.78
neighb
-0.77
knockout
-0.76
lifes
-0.75
slam
-0.73
spitting
-0.73
wrapped
-0.72
grip
-0.72
agon
-0.71
casc
-0.70
POSITIVE LOGITS
Additionally
1.50
However
1.48
References
1.43
Furthermore
1.43
Conclusion
1.42
Below
1.41
Therefore
1.39
Using
1.39
Generally
1.39
Although
1.38
Activations Density 0.419%