INDEX
Explanations
phrases indicating results from research studies or scientific findings
New Auto-Interp
Negative Logits
itar
-0.16
κά
-0.15
abe
-0.14
ihn
-0.14
ettel
-0.14
938
-0.13
910
-0.13
723
-0.13
utenberg
-0.12
внеÑģ
-0.12
POSITIVE LOGITS
findings
0.81
results
0.74
results
0.60
finding
0.60
Find
0.57
find
0.56
Results
0.56
finds
0.55
-find
0.54
find
0.52
Activations Density 0.253%