INDEX
Explanations
fruit names or related terms, particularly focusing on apples
occurrences of the word "apple" and its related forms
New Auto-Interp
Negative Logits
ategory
-0.86
iltr
-0.80
uled
-0.77
USS
-0.76
ilitation
-0.75
interrupted
-0.70
enance
-0.68
rolled
-0.66
Simulation
-0.65
VK
-0.63
POSITIVE LOGITS
cider
1.20
apple
1.15
fruit
1.12
apples
1.06
apple
1.04
baum
1.02
cone
1.00
juice
0.98
osite
0.93
peel
0.91
Activations Density 0.033%